Alex Baretta, Software Developer in San Jose, CA, United States
Alex Baretta

Software Developer in San Jose, CA, United States

Member since March 16, 2020
Alex is a versatile technologist with a deep academic background in computer science (Ecole Polytechnique), electrical engineering (Politecnico di Milano), and quantitative finance (Bocconi University). He has experience building search engines (Wink/MyLife), quantitative insurance risk and pricing models (The Climate Corporation), trading algorithms (Xambala/Final Strategies), stochastic optimization algorithms (KCG/Virtu). Alex is also the co-founder of two startups.
Alex is now available for hire




San Jose, CA, United States



Preferred Environment

Python (Panda, Scikit-learn, Matplotlib), C++ (Algorithmic Trading), Lucene (Search Engine), Hadoop (Big Data), Condor (Distributed Computing), OCaml (Strongly Typed Functional Programming), Bash (DevOps), Python/Boto3 (AWS Automation)

The most amazing...

...project I've worked on is a machine learning algorithm which was used to predict loss distributions for efficient insurance pricing.


  • Advising CTO

    2017 - 2019
    • Architected the product stack (front end, middle tier, back end, and database).
    • Carefully designed the AWS execution environment based on ECS/Fargate and implemented an Infrastructure-as-Code automated build and deployment tool using Python/Boto3.
    • Built a generic database layer blending the best of relational and NoSQL technology by leveraging PostgreSQL's native JSON support and GIN indexing technology.
    • Hired and mentored a highly performant engineering team.
    • Architected the data collection process to support to the company's cyber insurance AI model.
    • Researched and developed a prototype of "real time" AI approach based on information retrieval (i.e. inverted index) techniques, which does not require a training stage, other than indexing each new observation.
    Technologies: JavaScript (TypeScript, Node.js, Express.js, React), Java (Spring), PostgreSQL, AWS (Python/Boto 3)
  • Quantitative Strategist, US Equities

    2016 - 2018
    KCG Holdings, Inc. | Virtu Financial, Inc.
    • Developed a framework for reinforcement learning in the context of high-frequency trading.
    • Built an a stochastic optimization framework for the parameters of KCG's flagship algorithmic trading strategy.
    • Contributed to the development of a high-throughput trading simulator to support the evaluation of the performance of various marketing making strategies and predictors.
    Technologies: C++, OCaml, Python, Pandas, Machine Learning, Reinforcement Learning, Stochastic Optimization
  • VP of Data Science | Chief Technology Officer

    2014 - 2016
    Lumity, Inc.
    • Developed a novel machine learning algorithm (non-parametric conditional density estimation) for quantitative risk management and efficient pricing of insurance products.
    • Built a machine learning model to predict the out-of-pocket expenses of an individual based on the features of a health insurance plan and the individual's risk profile.
    • Architected the benefits enrollment platform at the heart of Lumity's benefits brokerage platform.
    • Hired and mentored a high-functioning engineering and data-science team.
    Technologies: OCaml, Scala, Python, JavaScript, Stochastic Gradient Boosting
  • Research Engineer – High Frequency Trading

    2012 - 2014
    Xambala Capital
    • Built an extensive library of signals derived from the the raw market event feeds for machine learning applications.
    • Developed low-latency market predictors using GNU R and Glmnet. The resulting models were blazingly fast to evaluate, as required by high-frequency trading, and their statistical performance was competitive with far more complex non-linear models.
    • Maintained and extended a stock exchange simulator, supporting feeds from all major use exchanges (Nasdaq, NYSE, Arca, BATS, DirectEdge) and implemented the distinct semantics of each exchange's matching engine.
    • Constructed an order router translating from Xambala's native ordering protocol to the protocols of all major US equities exchanges and several dark pools.
    • Built a family of two-sided liquidity providing marketing making algorithms for tight-spread, high-volume stocks.
    Technologies: OCaml, C++, GNU R, Python, NumPy, Matplotlib
  • Lead Engineer, Pricing and Risk Management

    2011 - 2012
    The Climate Corporation
    • Built Climate's weather-based crop insurance pricing algorithm based on a distributed Montecarlo simulation of predicted weather patterns over the insured farm throughout the growing season.
    • Developed a Hadoop/MapReduce-based algorithm to produce the quantitative risk reports on the entire crop insurance portfolio for the reinsurance partners.
    • Researched the feasibility of expressing the insurance policy's payout calculation as a type of data by representing it as a first-class function in a functional programming language (Clojure).
    Technologies: Java, Clojure, Hadoop
  • Senior Engineering — Search Technology

    2009 - 2011 |
    • Built a Hadoop/MapReduce indexing algorithm to process the document corpus into a set of Lucene indexes for the search engine cluster. This was 100x faster than the previous version, which was based on a static cluster of Lucene indexing servers.
    • Developed a real-time Lucene read-write indexing service to make newly acquired data immediately accessible through the search engine. This complemented the main search cluster, which served the static document corpus, and was updated infrequently.
    • Leveraged Lucene inverted index technology to support k-nearest neighbors predictive modeling.
    Technologies: OCaml, Java, Lucene, Hadoop


  • EigenDog: Stochastic Gradient Boosting Learner Written in OCaml (Development)

    Friedman's stochastic gradient boosting machine (S-GBM) is a powerful machine learning algorithm that models the training dataset through an ensemble of decision trees. Like deep learning, S-GBM is a universal function approximator. Unlike deep learning, where the network training process involves gradient descent on the neural network's parameters, S-GBM relies on gradient descent in "functional space:" every step of the algorithm constructs a piecewise function in the form of a decision tree, whose addition to the model maximizes the reduction in training loss.
    This approach has several advantages over deep learning. In particular, it is possible to construct a cross-validation path, showing the tradeoff between variance and bias as a function of the number of trees in the ensemble. Early-termination of the algorithm based on the cross-validation path obviates to the need to decide the algorithms hyperparameters ahead of time. This is in contrast with deep learning, where the network topology is fixed. S-GBM works remarkably well for structured data with a large number of categorical or ordinal (but not necessarily metric) variables.
    Dawg is an efficient implementation of S-GBM in OCaml. I worked on it from 2016-2017.


  • Languages

    OCaml, Python 3, Python, R, Scala, Bash, Bash Script, Java, C, C++, Julia
  • Frameworks

    Hadoop, Spark
  • Libraries/APIs

    Pandas, Apache Lucene, Scikit-learn, PySpark, TensorFlow, Sklearn
  • Tools

    Boto 3
  • Paradigms

    Data Science
  • Platforms

    Docker, Linux, Kubernetes
  • Industry Expertise

    Algorithmic Trading
  • Storage

    PostgreSQL, Data Validation, MySQL
  • Other

    Machine Learning, Time Series Analysis, Stochastic Modeling, Numerical Optimization, Deep Neural Networks, Model Validation, Predictive Analytics, AWS, Gradient Boosting, Gradient Boosted Trees, Random Forests, Bash Scripting, Big Data, Big Data Architecture, Computer Vision, Deep Learning, Neural Networks, Condor


  • Master's degree in Finance and Banking
    2009 - 2010
    SDA Bocconi School of Management, Bocconi University - Milano, Italy
  • Participated in an international exchange program (non-degree program) in Computer Science
    2000 - 2001
    École Polytechnique - Palaiseau, France
  • Engineer's degree in Computer Engineering and Electrical Engineering
    1996 - 2001
    Politecnico di Milano - Milano, Italy

To view more profiles

Join Toptal
Share it with others