Alex Baretta, Technology Leader/ Developer in San Jose, CA, United States
Alex Baretta

Technology Leader/ Developer in San Jose, CA, United States

Member since March 26, 2020
Alex is a versatile technologist with a deep academic background in computer science (Ecole Polytechnique), electrical engineering (Politecnico di Milano), and quantitative finance (Bocconi University). He has experience building search engines (Wink/MyLife), quantitative insurance risk and pricing models (The Climate Corporation), trading algorithms (Xambala/Final Strategies), and stochastic optimization algorithms (KCG/Virtu). Alex is also the co-founder of two startups.
Alex is now available for hire

Portfolio

Experience

Location

San Jose, CA, United States

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Automation, AWS, DevOps, Bash, Programming, OCaml, Distributed Computing, Condor, Big Data, Hadoop, Search, Algorithmic Trading, C, Matplotlib, Scikit-learn, Python

The most amazing...

...project I've worked on is a machine learning algorithm which was used to predict loss distributions for efficient insurance pricing.

Employment

  • Head of High-frequency Trading

    2020 - 2021
    TickUp (HFT / Hedge Fund)
    • Developed a low-latency C++ book builder and market feed processor for Nasdaq TotalView-ITCH. Later my engineering team added support for all other US equities exchanges and their respective data protocols.
    • Hired a team of eight top-caliber algorithmic trading quants and C++ engineers.
    • Designed a multi-tactic HFT strategy for US equities supporting tactics, including passive orders at the near side with signal-driven cancellation, hidden midpoint peg orders, and IOC (aggressive) orders.
    • Developed a broad array of statistically powerful predictive models to inform the strategy's trading activity.
    Technologies: C++, Director, Machine Learning, Data Science, Big Data, Data, Leadership, Trading, Hiring, Data Engineering, ETL, TensorFlow, Kubernetes, AWS S3, Team Leadership
  • CTO (contract)

    2017 - 2019
    Zeguro (Cyber Security / Cyber Insurance)
    • Architected the product stack: front end, middle tier, back end, and database.
    • Carefully designed the AWS execution environment based on ECS with Fargate and implemented an infrastructure as code automated build and deployment tool using Boto 3 in Python.
    • Built a generic database layer, blending the best of relational and NoSQL technology by leveraging PostgreSQL's native JSON support and GIN indexing technology.
    • Hired and mentored a highly performant engineering team.
    • Architected the data collection process to support the company's cyber insurance AI model.
    • Researched and developed a prototype of a "real-time" AI approach for cyber insurance underwriting based on information retrieval (i.e., inverted index) techniques, which does not require a training stage other than indexing each new observation.
    Technologies: Amazon Web Services (AWS), AWS, Python, Boto 3, PostgreSQL, Spring, React, Express.js, Node.js, TypeScript, JavaScript, Python 3, Leadership, Director, Data, Atlassian, GitLab, Kubernetes, Amazon Cognito, AWS DynamoDB, AWS S3, AWS CloudWatch, Team Leadership
  • Quantitative Strategist, US Equities

    2016 - 2018
    KCG Holdings, Inc. | Virtu Financial, Inc. (HFT)
    • Developed a framework for reinforcement learning in the context of high-frequency trading.
    • Built a stochastic optimization framework for the parameters of KCG's flagship algorithmic trading strategy.
    • Contributed to developing a high-throughput trading simulator to support evaluating the performance of making various marketing strategies and predictors.
    Technologies: Optimization, Reinforcement Learning, Pandas, Python, OCaml, C++, SQL, Machine Learning, Python 3, Deep Learning, ETL
  • VP of Data Science | Chief Technology Officer

    2014 - 2016
    Lumity, Inc. (Health Insurance)
    • Developed a novel machine learning algorithm and non-parametric conditional density estimation for quantitative risk management and efficient pricing of insurance products.
    • Built a machine learning model to predict the out-of-pocket expenses of an individual based on the features of a health insurance plan and the individual's risk profile.
    • Architected the benefits enrollment platform at the heart of Lumity's benefits brokerage platform.
    • Hired and mentored a high-functioning engineering and data science team.
    Technologies: Data Science, JavaScript, Python, Scala, OCaml, SQL, Machine Learning, AWS S3, AWS CloudWatch, Team Leadership
  • Research Engineer | High Frequency Trading

    2012 - 2014
    Xambala Capital (HFT)
    • Built an extensive library of signals derived from the raw market event feeds for machine learning applications.
    • Developed low-latency market predictors using GNU R and glmnet. The resulting models were blazingly fast to evaluate, as required by high-frequency trading, and their statistical performance was competitive with far more complex non-linear models.
    • Maintained and extended a stock exchange simulator, supporting feeds from all major use exchanges, including Nasdaq, NYSE, Arca, BATS, and Direct Edge, and implemented the distinct semantics of each exchange's matching engine.
    • Constructed an order router translating from Xambala's native ordering protocol to the protocols of all major US equities exchanges and several dark pools.
    • Built a family of two-sided liquidity providing marketing-making algorithms for tight-spread, high-volume stocks.
    Technologies: Matplotlib, NumPy, Python, GNU, C++, OCaml, SQL, Machine Learning, Python 3, Deep Learning
  • Lead Engineer | Pricing and Risk Management

    2011 - 2012
    The Climate Corporation (Weather Insurance)
    • Built Climate's weather-based crop insurance pricing algorithm based on a distributed Monte Carlo simulation of predicted weather patterns over the insured farm throughout the growing season.
    • Developed a Hadoop MapReduce-based algorithm to produce the quantitative risk reports on the entire crop insurance portfolio for the reinsurance partners.
    • Researched the feasibility of expressing the insurance policy's payout calculation as a type of data by representing it as a first-class function in a functional programming language, namely Clojure.
    Technologies: Hadoop, Clojure, Java, SQL, Machine Learning, Python 3, Leadership, Data, Team Leadership
  • Senior Engineering | Search Technology

    2009 - 2011
    Wink.com | Mylife.com (Social Media)
    • Built a Hadoop MapReduce indexing algorithm to process the document corpus into a set of Lucene indexes for the search engine cluster that was 100 times faster than the previous version, which was based on a static cluster of Lucene indexing servers.
    • Developed a real-time Lucene read-write indexing service to immediately make newly acquired data accessible through the search engine, which complemented the main search cluster, served the static document corpus, and was updated infrequently.
    • Leveraged Lucene inverted index technology to support k-nearest neighbors predictive modeling.
    Technologies: Hadoop, Apache Lucene, Java, OCaml, Data, Big Data, Search, Web Search, Distributed Computing, Leadership, Team Leadership
  • Freelance Software and Industrial Automation Engineer

    2001 - 2009
    Self-employed
    • Organized and led the Zoonoses project at the European Food Safety Authority (EFSA), a branch of the European Commission. The Zoonoses project is the most important IT project at EFSA, handling data collection from 25-member states.
    • Designed and programmed a fully automated sandblasting industrial robot as part of the assembly line of a company manufacturing concrete mixer trucks.
    • Developed an HTML and JavaScript web UI for an industrial cutting robot. The control software developed included an algorithm to optimize the layout of the pieces cut out of the raw material, minimizing the total amount of material required.
    Technologies: Java, Oracle, PHP, SQL, HTML, JavaScript, MySQL, PostgreSQL, PLC, Computer Vision

Experience

  • EigenDog: Stochastic Gradient Boosting Learner Written in OCaml
    https://github.com/alexbaretta/dawg/tree/dawg2

    Friedman's stochastic gradient boosting machine (S-GBM) is a powerful machine learning algorithm that models the training dataset through an ensemble of decision trees. Like deep learning, S-GBM is a universal function approximator. Unlike deep learning, where the network training process involves gradient descent on the neural network's parameters, S-GBM relies on a gradient descent in "functional space." Every step of the algorithm constructs a piecewise function in the form of a decision tree, whose addition to the model maximizes the reduction in training loss.

    This approach has several advantages over deep learning. In particular, it is possible to construct a cross-validation path, showing the tradeoff between variance and bias as a function of the number of trees in the ensemble. Early termination of the algorithm based on the cross-validation path obviates the need to decide the algorithm's hyperparameters ahead of time, in contrast with deep learning, where the network topology is fixed. S-GBM works remarkably well for structured data with a large number of categorical or ordinal, but not necessarily metric, variables.

    Dawg is an efficient implementation of S-GBM in OCaml. I worked on it from 2016 to 2017.

Skills

  • Languages

    SQL, OCaml, Python 3, C++, Python, R, Scala, Bash, Bash Script, Java, C, PL/pgSQL, JavaScript, TypeScript, Clojure, Julia, PHP, HTML
  • Frameworks

    Hadoop, Spark, Express.js, Spring
  • Libraries/APIs

    Pandas, Apache Lucene, Scikit-learn, PySpark, TensorFlow, Matplotlib, Node.js, React, NumPy
  • Tools

    Boto 3, Amazon Cognito, GitLab, AWS CloudWatch, Atlassian
  • Paradigms

    Data Science, Distributed Computing, DevOps, Automation, ETL
  • Platforms

    Amazon Web Services (AWS), Docker, Linux, Azure, Kubernetes, Oracle, Director
  • Storage

    PostgreSQL, Data Validation, AWS S3, MySQL, PL/SQL, AWS DynamoDB
  • Other

    Machine Learning, Algorithmic Trading, Time Series Analysis, Stochastic Modeling, Numerical Optimization, Deep Neural Networks, Model Validation, Predictive Analytics, AWS, Gradient Boosting, Gradient Boosted Trees, Random Forests, Big Data, Big Data Architecture, Deep Learning, Team Leadership, Computer Vision, Leadership, Search, Programming, Reinforcement Learning, Optimization, GNU, Neural Networks, Condor, PLC, Fintech, Insurance Technology (Insurtech), Automated Trading Software, Statistics, Artificial Intelligence (AI), Classification, Decision Tree Regression, Decision Trees, Decision Tree Classification, Model Regularization, Cross-validation, Data, Trading, Hiring, Data Engineering, Web Search

Education

  • Master's Degree in Finance and Banking
    2009 - 2010
    SDA Bocconi School of Management, Bocconi University - Milano, Italy
  • Participated in an International Exchange Program (Non-degree Program) in Computer Science
    2000 - 2001
    École Polytechnique - Palaiseau, France
  • Engineer's Degree in Computer Engineering and Electrical Engineering
    1996 - 2001
    Politecnico di Milano - Milano, Italy

To view more profiles

Join Toptal
Share it with others