Pedro Henrique Rocha Moy, Developer in Miami, FL, United States
Pedro is available for hire
Hire Pedro

Pedro Henrique Rocha Moy

Verified Expert  in Engineering

Machine Learning Developer

Miami, FL, United States

Toptal member since April 25, 2019

Bio

Pedro is a business-oriented seasoned data scientist and data engineer with experience building and deploying production distributed data pipelines and machine learning models at scale, covering the entirety of the data lifecycle from design, construction, optimization, deployment, and monitoring of data architectures and machine learning models. Pedro's focus is to deliver solutions that are robust to changes in environment and data and flexible to address changes in business requirements.

Portfolio

Rocha Moy Trading
Python, Julia, AWS, Options Trading, APIs, Web Scraping, Probability Theory...
Self-employed
Scikit-learn, SpaCy, NLP, Generative Pre-trained Transformers (GPT)...
Toptal Client
Python, Amazon Elastic MapReduce (EMR), Spark, Snowflake

Experience

Availability

Full-time

Preferred Environment

Python, Scala, Amazon Web Services (AWS), Data Engineering, Data Science, Machine Learning, Big Data, Software Architecture

The most amazing...

...systems I've built are algorithmic and probabilistic trading systems. With a limited view of the world, probabilities are essential tools in risk management.

Work Experience

Chief Architect

2017 - PRESENT
Rocha Moy Trading
  • Developed the API for probabilistic and algorithmic options trading with Interactive Brokers and TD Ameritrade. Specialties include data integration, task automation, portfolio simulations, risk mitigation, and strategy validation.
  • Integrated many different data sources from APIs to web scraping.
  • Automated trade execution, scheduling of trades, and release of funds for trading completely.
Technologies: Python, Julia, AWS, Options Trading, APIs, Web Scraping, Probability Theory, Machine Learning, Simulations, Data Integration

Lead Data Scientist

2021 - 2022
Self-employed
  • Designed, implemented, and deployed different natural language processing models.
  • Worked with stakeholders to understand use cases, the pathway to product development, and implementation using deployed models.
  • Mentored and supported junior data scientists on the team.
Technologies: Scikit-learn, SpaCy, Generative Pre-trained Transformers (GPT), NLP, Neural Network, XGBoost

Enterprise Lead Data Architect - Contractor

2020 - 2022
Toptal Client
  • Handled the architecture, development, and automation of distributed computing pipelines and data storage in the cloud for the enterprise.
  • Automated scalable infrastructure in the cloud to respond to development and consumer demand.
  • Co-managed and supervised a team of engineers from designing and delegating tasks, mentoring, and overseeing work.
Technologies: Python, Amazon Elastic MapReduce (EMR), Spark, Snowflake

Enterprise Senior ETL and Data Engineer - Contractor

2019 - 2020
Toptal Client
  • Designed, implemented, and deployed to production fully-fledged distributed ETL jobs in Spark/Scala API.
  • Worked with various sources and sinks of data including desperate files, Hive tables, Mongo collections, and Kafka brokers.
  • Served as the senior engineer and tech lead of the team strengthening engineering and development processes, improving software quality control, and helping design stories for sprints.
Technologies: Oracle SQL, DocumentDB, Scala, Python, MongoDB, Spark, Spark, Apache Kafka, Hadoop

Hadoop Proof of Concept for Atmospheric Sciences Project - Contractor

2019 - 2020
Toptal Client
  • Built cluster from scratch adhering to client's needs to work with home cluster.
  • Designed and implemented generic and specific data architectures meeting the client's query's complexity and performance needs.
  • Built PySpark and Python software layers of abstraction to allow the client to build on top of the current infrastructure.
Technologies: PySpark, Hadoop

Research Data Engineer

2018 - 2019
Nicklaus Children’s Hospital
  • Developed existing analytical and data workflows for users of R, Python, and Impala establishing best engineering practices.
  • Provided ad hoc and systematically developed ETL and big data pipelines, validation, and integration of varying data sources.
  • Liaised for the research department to IT and BI departments providing guidance and expertise on analytical and data needs.
Technologies: Apache, Hadoop, Spark, Scala, Python

Technical Advisor

2018 - 2018
Insight Data Science
  • Worked with fellows and their data engineering projects on problem definition, systems architecture, and execution.
  • Advised on technologies such as Spark, Kafka, Redis, HBase, Cassandra, and PostgreSQL.
  • Conducted mock interviews with fellows on scalability concepts, algorithms, and CS fundamentals.
Technologies: PostgreSQL, Cassandra, HBase, Redis, Apache Kafka, Spark

Senior Software Engineer

2016 - 2017
NexHealth
  • Developed and deployed software to the client's site to perform data collection and server sync.
  • Performed both database and web-based data integrations of electronic medical records back to NexHealth servers.
  • Developed a smart SMS response system allowing the user to interact with NexHealth products via SMS.
Technologies: Redis, PostgreSQL, Big Data Architecture, JavaScript, Scala, Python, Ruby on Rails

Data Scientist

2016 - 2016
QuaEra Insights
  • Served as the lead data scientist in a consulting project overseeing data management and modeling strategy.
  • Used natural language processing to transform unstructured data into features and extract business intelligence.
  • Built a recommendation engine as business rules potentially yielding savings on up to 50% of the business.
Technologies: Python

Data Engineering Fellow

2015 - 2015
Insight Data Science
  • Built the themidgame-tube, a platform designed to discover YouTube influencers on brand names worldwide.
  • Deployed Amazon’s EMR Spark with HBase processing and ingesting billions of data tuples.
  • Attained linear scalability performance tested with up to 20 nodes.
Technologies: AWS, Bootstrap, Hadoop, Big Data Architecture, Python

Data Analyst

2015 - 2015
Cartesian
  • Aided managed analytics efforts promoting best practices within batch workflows and data management.
  • Conducted independent research into big data workflows considering data mining and BI integration.
  • Built short data pipelines consuming APIs, transforming, loading, and exposing data connections to BI tools.
Technologies: Alteryx, PostgreSQL, R, Python, Data Science, Managed Analytics

Data Analytics Engineer

2013 - 2015
Daktari Diagnostics
  • Worked as the lead developer of mainstream data processing and data analysis applications in Python for Windows/Mac.
  • Developed a calibration model for the Daktari CD4 testing device improving the system's accuracy by 20-30%.
  • Deployed machine learning models embedded in standalone applications to end users for data classification.
Technologies: SQL Server, JMP, SAS, R, Python

Continuous Edging and Hedging Equity Trading Strategy

https://docs.google.com/presentation/d/1zkbfErfwbJvGBXFj9UWKDvq99wkj6EBvqniA4yFNu68/edit?usp=sharing
This investigation explores reinforcement learning agents as a means to generate a diversified set of strategies that guarantees existing optimal strategies for any market condition. The preliminary result demonstrates that the pool of agents provides the desirable diversity transforming the algorithmic trading challenge into a problem of selection (which may be tackled with AI methods such as evolutionary computing).
2021 - 2022

Executive MBA in Business Administration

University of Miami - Miami

2015 - 2017

Master's Degree in Computer Science (Machine Learning)

Georgia Institute of Technology - Atlanta, GA

2010 - 2012

Master's Degree in Earth Science and Engineering (Geophysics)

King Abdullah University of Science and Technology - Saudi Arabia

2008 - 2010

Bachelor's Degree in Mechanical Engineering

University of Massachusetts Lowell - Lowell, MA

Libraries/APIs

Microsoft Development, PySpark, TensorFlow, PyTorch, Scikit-learn, XGBoost, Dask, SpaCy

Tools

ChatGPT, Amazon Elastic MapReduce (EMR), Spark, JMP, Apache, Git, Gensim

Languages

Python, Julia, Scala, SQL, R, SAS, JavaScript, Bash, Snowflake

Storage

NoSQL, MongoDB, Oracle SQL, SQL Server, Redis, Cassandra, PostgreSQL, HBase, Hadoop, Data Integration

Paradigms

Functional Programming, Parallel Programming, Distributed Computing

Platforms

Docker, Jupyter Notebook, Apache Kafka, Alteryx, Linux, AWS

Frameworks

Bootstrap, Ruby on Rails, Spark, Big Data Architecture, Flask, Hadoop, Streamlit Development

Industry Expertise

Accounting

Other

Machine Learning, Distributed Systems, GPT-4, Financial Modeling, UI Development, APIs, Data Architecture, Data Modeling, DocumentDB, Dash, Deep Learning, NLP, Data Science, Data Engineering, Artificial Intelligence, Algorithms, Algorithms, Optimization, Reinforcement Learning, Time Series Analysis, Forecasting, Cloud Engineering, Numerical Optimization, Sentiment Analysis, Neural Network, Options Trading, Web Scraping, Probability Theory, Simulations, Finance, Law, Entrepreneurship, Leadership, Big Data Architecture, Software Architecture, Generative Pre-trained Transformers (GPT), Data Science, Managed Analytics

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring