Vivek Ramaswamy, Developer in Toronto, ON, Canada
Vivek is available for hire
Hire Vivek

Vivek Ramaswamy

Verified Expert  in Engineering

Bio

Vivek is an IT professional with 15 years of experience in designing and building systems, the last five years in big data systems. He excels in multiple tools, technologies, and programming languages, including SQL, .NET, Java, Scala, and JavaScript. Vivek has worked with data in Excel, VBA, Access, RDBMS, and distributed systems.

Portfolio

Pogo Technologies, Inc.
Amazon Web Services (AWS), PostgreSQL, Dagster, Snowflake, Python 3, Python...
Leading FX Trading Platform
Google BigQuery, Cloud Dataflow, MuleSoft, Apache Kafka, Java 8, Kdb+, OneTick...
Leading Insurance Broker
StreamSets, Amazon S3 (AWS S3), Cloudera, Impala, Big Data Architecture

Experience

Availability

Part-time

Preferred Environment

Apache Kafka, Apache Hive, Spark, SQL, Excel VBA, Apache Impala, Cloudera, Scala, Java 8, Python

The most amazing...

...thing I've done is efficiency improvement of data loaders while bringing in more visibility into the process.

Work Experience

Data Engineer

2022 - 2022
Pogo Technologies, Inc.
  • Gathered requirements to understand the different data sources and their mode of data export. Perform quick proof of concepts to identify the appropriate tool to use to build the data pipelines.
  • Built standardized data pipelines in Dagster to source data from multiple sources and land data in Snowflake in a scheduled manner. Routed run notifications to slack channels for easy monitoring and oversight.
  • Repurposed and enabled a live staging area to enable a prod-like environment for wet runs and validations.
Technologies: Amazon Web Services (AWS), PostgreSQL, Dagster, Snowflake, Python 3, Python, Slack, SQL, ETL Tools, ETL

Software Engineer

2020 - 2021
Leading FX Trading Platform
  • Set up streaming pipelines in Google Cloud Dataflow to land data from on-premise, schemaless Kafka topics to BigQuery. Captured and stored schema drifts to ensure a smooth run.
  • Evaluated different tick database vendors to store tick data. Ran POCs and compared different products' benchmark performances by setting up a greenfield environment and running simulations of common use cases.
  • Used MuleSoft dataflows to replace Informatica jobs.
  • Built a Java Spring framework-based wrapper service hosted on Google IAP to query Domo and send out data in CSV format for resellers. Rebuilt Domo dashboards based on pre-existing ones.
Technologies: Google BigQuery, Cloud Dataflow, MuleSoft, Apache Kafka, Java 8, Kdb+, OneTick, ExtremeDB, Apache Airflow, BigQuery, Spring Boot, Google Cloud Platform (GCP), SQL, ETL, ETL Tools, Domo, Apache Beam, Apache Maven

Big Data Engineer | Associate Director

2019 - 2020
Leading Insurance Broker
  • Built ETL pipelines using StreamSets Data Collector to land data from various sources into the cloud data platform based on S3 and supported by Impala.
  • Evaluated the use of StreamSets Transformer as a complementary ETL tool for batch processing.
  • Assessed the use of Apache Airflow for orchestrating and deciding if it would fit a multitenant environment.
Technologies: StreamSets, Amazon S3 (AWS S3), Cloudera, Impala, Big Data Architecture

Senior Big Data Engineer

2018 - 2019
Leading French Investment Bank
  • Redesigned the streaming application to be more contextual and in line with the nature of data.
  • Incorporated logging into ELK for more real-time metrics and analysis.
  • Improved the deployment process, making it more efficient and independent using Unix helpers.
Technologies: Spark, Spark SQL, Spark Streaming, Apache Hive, Orc, ELK (Elastic Stack), StreamSets, Java 8, HDFS, HBase, Apache Kafka, Big Data Architecture, Java, Hadoop, Apache Maven

Senior Associate

2013 - 2018
Global Bank Leader in the Private and Investment Banking Space
  • Improved Dynamics CRM application performance and load time by identifying the bottleneck and applying technical and functional fixes. Scaled up the data loader by leveraging .NET multi-threading and Dynamics CRM's bulk handling capability.
  • Sped up Spark job timing by using the caching ability for the Hive and Spark SQL tables.
  • Used the Akka Streams to improve the extensive file loading process on the edge Node.js with minimum resources to circumvent throughput and memory contention issues.
Technologies: Scala, .NET, Dynamics CRM 2011, Dynamics CRM 2013, Greenplum, Cloudera, Apache Hive, Spark, Spark SQL, Spark Streaming, HBase, Parquet, Informatica, Big Data Architecture, Microsoft SQL Server, Hadoop

IT Analyst

2010 - 2013
Leading Consulting Firm
  • Identified the bottlenecks and improved the performance of SQL queries.
  • Migrated the application from Dynamics CRM 4.0 to Dynamics CRM 2011. Revamped the application to use the new, version-breaking SDK while also incorporating Silverlight to display a new custom UI.
  • Integrated IBM MQ with VB 6 to replace a screen scrapping procedure.
  • Built an Excel-based tool using macros to capture data from multiple Excel files for reporting.
Technologies: Dynamics CRM 2011, Microsoft Dynamics CRM, Dynamics CRM Plugins, SQL Server 2008 R2, .NET, Microsoft Silverlight, Visual Basic 6 (VB6), IBM MQ, Microsoft SQL Server, Microsoft Excel, Visual Basic, Visual Basic for Applications (VBA), Excel Macros

Research Engineer

2008 - 2009
VoIP Service Provider
  • Enabled system integration between Linux back-end systems and Windows-based front-end systems.
  • Built a flash-based SIP Softphone integrated within the browser used in a Windows-based MIS system to facilitate displaying agent availability and placing internal SIP calls directly.
  • Identified and optimized the internal system by migrating it from VB 5 to VB 6.
  • Created an Excel-based reconciliation tool using VBA/Macros to highlight and report errors for billing.
Technologies: Asterisk, C++, Visual Basic 6 (VB6), PostgreSQL, Slackware, Flash ActionScript, ActionScript 3, Session Initiation Protocol (SIP), Visual Basic, Visual Basic for Applications (VBA), Microsoft Excel, Excel Macros

Batch Data Warehouse System to Real-time Data Lake Migration

This project aimed to move away from end-of-day, batch-style feeds from a data warehousing system to a data lake system updated in real-time. The system was meant to act as a data source for all downstream applications that needed live data for functional and analytical use cases.

We used the CDC tool to move database feed updates to Kafka and wrote Spark streaming applications to process and store data in HBase. There was also a caching system outside the data lake to facilitate faster access to data.

OUTCOME
As a result of this migration, critical downstream applications had access to data in real-time. They no longer had to wait T+1 days for a feed or deal with stale data in case of batch processing failure. Due to near-real-time data feeds, it also created new opportunities to identify misuse and potential cross-sellable products.

On-premise Data Lake for Capital Markets Data

An on-premise data lake based on Hortonworks Data Platform facilitated storing data for corporate and investment banks dealing in capital markets. This data lake provided the necessary data for computing, tweaking, testing, and calibrating risk models and strategies. Different front office application systems would send in data and files via Kafka to the platform. The Spark-based batch and streaming applications would process and transform the data and store it in ORC formats in HDFS and expose them via hive tables for consumption. The application logs were also integrated into ELK to enable live monitoring of the applications.

OUTCOME
The project led to the collation of organization-wide trade data and provided an environment for scientists, actuaries, and risk modelers to analyze, test, and tweak their existing and new models.

Evaluation of Tick Databases

Evaluation, POC, and benchmark comparison of different databases capable of handling "ticker" data. TimescaleDB, Kdb+, eXtremeDB, and OneTick databases were evaluated. The databases were made to run on common hardware and were loaded with sample data. Then, they were benchmarked for the query performance, ease of use, scalability, and cost. The evaluation and POC gave all the necessary data points and enabled the business to make a more informed decision.
2006 - 2008

Master's Degree in Information Technology

University of Mumbai - Mumbai, India

2003 - 2006

Bachelor's Degree in Information Technology

University of Mumbai - Mumbai, India

Libraries/APIs

Spark Streaming, Protobuf, PySpark

Tools

Spark SQL, ELK (Elastic Stack), Cloudera, Impala, Apache Airflow, IntelliJ IDEA, Apache Impala, BigQuery, Apache Maven, Cloud Dataflow, Asterisk, Microsoft Dynamics CRM, Microsoft Silverlight, IBM MQ, Jenkins, Stash, Git, Control-M, Slack, Tableau, Domo, Apache Beam, Microsoft Excel, Microsoft Power BI

Languages

Orc, SQL, Excel VBA, Java 8, Scala, C++, Python, Power Query M, Visual Basic 6 (VB6), Flash ActionScript, ActionScript 3, Java, Snowflake, Python 3, Visual Basic, Visual Basic for Applications (VBA)

Paradigms

ETL, Parallel Computing

Platforms

Apache Kafka, Hortonworks Data Platform (HDP), MuleSoft, Slackware, Google Cloud Platform (GCP), Amazon Web Services (AWS), Databricks

Storage

HDFS, Apache Hive, Database Management Systems (DBMS), RDBMS, HBase, Microsoft SQL Server, Kdb+, ExtremeDB, Amazon S3 (AWS S3), PostgreSQL, Greenplum, SQL Server 2008 R2, Neo4j

Frameworks

Spark, .NET, Apache Spark, Spring Boot, Hadoop

Other

StreamSets, Data Engineering, Google BigQuery, Distributed Systems, ELT, Big Data Architecture, APIs, Excel Macros, OneTick, Dynamics CRM 2011, Dynamics CRM 2013, Parquet, Informatica, Session Initiation Protocol (SIP), Dynamics CRM Plugins, Dagster, ETL Tools, GraphDB

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring