- Math Savvy Python DeveloperMIT2016 - 2017
Technologies: Python, Git, Networkx, MATLAB
- Converted Daedalus software, used to construct models of nanoparticles, from MATLAB into Python. See http://daedalus-dna-orig ami.org/ for both original MATLAB version and new Python version built from scaffolded DNA given a nearly arbitrary 3D target shape. Note a design constraint such that the Python version had to very closely match the MATLAB version to better cater to current lab members.
- Data Scientist and Web DeveloperDoing, Inc.2016 - 2017
Technologies: Python, SQLAlchemy, Doc2vec, Tf-idf, Neural networks, Graph theory, De-anonymization
- Led the processes of scoping and selecting possible machine learning uses, prototyping chosen initiatives, and productizing final models.
- Contributed to the development of the project Canonicalization. The core of Doing’s data is scraped event postings from several major event publishers. Through this, we frequently encountered duplicate locations across sources and duplicate events across and within sources. A distance-based test done theoretically comparing every event to every other event (but optimized enough to be computationally feasible; almost fast) or every location to every other location let us find events and locations that were so similar they were likely the same. This project was built from scratch up through productization.
- Helped build a tag extraction project. To help users quickly understand events, it is useful to have a short list of potent tags attached to each event. This project was prototyped using an aggregation of Doc2vec and Tf-idf. It was validated through systematically generating surveys via Google Docs to let the team give feedback on the quality of tags generated.
- Helped build a categorization project. Similar to tags, categories are useful to help us better understand our events and to help users better navigate available events. This was also prototyped using Doc2vec comparing each event to a whitelist of available categories (which came from picking the most popular categories listed by our data sources). This one reached the stage of prototype.
- Contributed to the development of the project DoingRank. Given a complete lack of user data (the startup’s app is still unreleased) but significant event data, none of the supervised recommender algorithms fit. So the first version (that only barely reached the stage of prototype) had two components. The first, to encode an abstract notion of event quality, was a math-ized version of the collective intuition of properties expected in good event postings (a title that matches the description, consistent event postings, etc). The second part is user-specific and maps RSVPs and other direct-app interactions through tags/categories to form a high-level notion of preference.
- Machine Learning ResearcherLearning Sites, Inc.2016 - 2016
Technologies: Python, Convolutional beural networks, Lasagne, Theano, Hyper-parameter optimization
- Developed optical character recognition of Egyptian Hieratic.
- Used a dataset created from a collection of 1,400 labeled pots in a NYC museum. Their NSF-funded project is to make a smartphone app to automate the translation of these tablets. They had the whole pipeline built, except for the OCR part.
- Manifested a solution as a software framework that built and returned a trained convolution neural network as a stand-alone function. We then integrated that function into an app to automate translation of Hieratic.
- Senior Data ScientistMobilepaks2014 - 2016
- Conceived, prototyped, and product-ized data science initiatives. Responsible for researching models and writing the valuable ones into the app.
- Created a relevance score model applied to content based on how users consume and react to content. This ended up being a mathematical equivalent to a neural network, though training was primarily done by interviewing domain experts due to little available data.
- Created a model that generates tags attached to content based on who consumes what content in which context. (e.g., if lots of sales people consume a document and nobody else touches it, the content is probably for sales people).
- Documented and identified holes in current client-facing reporting infrastructure. Built new reports into the app as appropriate. My contribution mostly focused on the back-end, but occasionally required front-end work too.
- Upgraded the current search engine to include spellcheck, faceting on our current tag infrastructure, and autocomplete.
- Data ScientistCloudability (via Grimm Science)2014 - 2014
Technologies: Python, R, Holt Winters
- Surveyed time series prediction methods.
- Conducted a case study on time series prediction applied to server usage in R.
- Wrote product-quality implementation of the chosen time series model (holt winters) from scratch in Python.
- Calibrated forecasting intervals (expected accuracy on predictions) in terms of performance, and trained and tested sets of data.
- Documented model implementation and testing procedures to enable the client's engineering team to build the model into their dashboard.
- Senior Data ScientistSovolve (via Grimm Science)2014 - 2014
Technologies: Linux, Python, PostgreSQL, Neo4j, Mixpanel
- Modeled user activity and interactions to optimize the user experience by filtering content to what is likely to be the most interesting and useful.
- Helped build out back-end data infrastructure to improve app performance and prepare for scalability.
- Conducted A/B studies to help with product decisions.
- Clustered user behavior into distinct and comprehensible segments.
- Conducted and internally published the app's virality to report product success and direct product decisions.
- Data ScientistPlayHaven2012 - 2014
Technologies: Linux, Python, Github, R, Hadoop Streaming
- Modeled and predicted user behavior in mobile games. Core projects included churn prediction and user path prediction.
- Managed relations between data science and engineering to catalyze productization of initiatives.
- Conducted ad hoc advanced analytics to assist in product decisions and to seed ideas for future data modeling.
- Rebuilt system logs: Solved for errors in observed device identifiers and marked invalid log entries as such. More precisely, the task was to write an iterative mapreduce algorithm to solve for all connected components in a several-billion node network using Hadoop Streaming and Python.
- Recruited, trained, and managed small teams of interns to assist with projects.
- Data Miner, Software Engineer, and Data EngineerNike Sport Research Lab2011 - 2012
Technologies: C++, Python, MySQL, Wolfram Alpha
- Demoed data mining.
- Defined roles for new full-time data miners in a lab.
- Created a database architecture to centralize the lab's data collection and analysis.
- Worked with researchers to import their personal research data into a consistent format.
- Liaised with lab researchers and the Wolfram team to build the centralized database.
- Research AssistantPortland State University - Teuscher Lab2010 - 2011
Technologies: Linux, C++, ParadisEO, Evolutionary Algorithms, Traffic simulations, Complex network analysis.
- Built an evolutionary algorithm in C++ using the library ParadisEO to evolve complex networks.
- Wrote a network evaluation utility to simulate traffic and calculate other metrics on networks representing massively parallel processors with non-traditional interconnections.
- Built out and documented the experimentation process to enable fellow researchers within and outside of the university to use my framework.
- Conducted experiments relating the properties of links to the types of networks it would optimally be used in.
- Wrote a thesis on creation of a framework and the results of initial experiments.