Data Scientist
2021 - PRESENTNordstrom- Developed an optimization heuristic for time-series-based allocations while under tight deadlines. The estimate used to compare both solutions found my heuristic to be within a couple of percentage points as good as the vendor-supported solution.
- Made an information mining framework and program that returned datasets ready to be optimized in NetworkX for determining which sets of items' total volume can fit in one building while minimizing the number of packages needed for multi-item orders.
- Created starting project goals that allowed for flexibility and future value for other projects that were realized. Enabling reuse of past time series data for future testing. Thus reducing AWS S3-related costs and computation and rework time.
Technologies: Amazon S3 (AWS S3), Docker, GitLabFounder
2019 - PRESENTPolitical Hack- Designed the platform and handled the source data.
- Produced visualizations, researched statistical methods, and networked with politically active and interested individuals and organizations.
- Developed logo and marketing approach and coded the prototype website.
- Created prototype website pages with static file header.
- Discovered data sources from many different sites.
- Developed data pipelines.
- Designed unified data schema.
- Coded complimentary visualizations.
Technologies: Django, React, D3.js, PythonData Scientist
2019 - PRESENTTactical Foresight Consulting, LLC- Used Python and R for data collection and statistical modeling, leveraging unsupervised models when labeled data was scarce.
- Determined and designed technological capabilities, showcasing proof-of-concept (POC) of said capabilities to the client.
- Created D3.js and Tableau visualizations for clients which reported needs.
- Built a program to parse court documents to count reference to legislative statues and detect novel combinations of laws.
- Used Bayesian Networks to visualize the influencers of a ballot measure pass rate.
- Used NLP to create a graph of activities from scraped data from news articles.
- Created an unsupervised system to detect key events in claim adjusters' notes, and implemented it in code for parallel processing.
- Created a system to detect the format of text to inform us of the purpose of the text.
Technologies: Spark, Hadoop, Neo4j, D3.js, JavaScript, R, PythonData Scientist (Consultant)
2018 - 2018MatchPoint- Suggested, created, and tested a framework of unsupervised methods to detect suggested suppliers.
- Presented results in a clear manner and developed flowcharts of how the system works.
- Used natural language processing dependency trees to create categorizes as a training set.
- Extracted useful search features from the text, created classifications for matching and search problems, and worked on experiments which resulted in successful unsupervised matching algorithm with approximately 96% accuracy.
- Developed metaheuristics for creating and sourcing training datasets.
Technologies: Regex, SQL, PythonData Scientist
2017 - 2018Systematrix Solutions- Used Spark MLlib via PySpark for outlier detection on GraphX RDDs.
- Presented and coded new algorithms for graph analytics using GraphX and Scala.
- Used PySpark for fraud analytics on banking records via RDD transformations, filters, and joins.
- Created, modified, and benchmarked machine-learning algorithms for statistical inference on network properties and money laundering prediction in a Docker container.
- Routinely provided qualitative insights into upcoming roadblocks to meeting projects and customers needs before it was a noticeable problem.
- Took the initiative to develop and present data privacy policies, standards, processes, and local and international legal requirements.
- Translated the fraud investigators' goals to extract essential subgraphs via graph-properties filters and transversals that delivered explicitly fraudulent connections in addition to causing a reduction processing time for analytics.
- Prescribed a strategic approach to handle changing algorithmic regulations, burst-out-fraud, and take-over-fraud.
Technologies: Spark, Hadoop, Neo4j, D3.js, JavaScript, Scala, SQL, PythonOperational Intelligence Analyst
2015 - 2017Stanford University- Used mathematical techniques and fitted statistical models to analyze data related to business problems and visualized the results in Tableau dashboards and Neo4j.
- Visualized and Identified contextual data that was needed, patterns, summary statistics and trends using (but not limited to): graph analytics, non-parametric ensemble models, Bayesian inference, and natural language processing (NLP).
- Adjusted the code for multicore parallel processing on computer clusters and used MapReduce functions to aggregate data for customer profile to supplement Neo4j database.
- Used Cypher (Neo4j QL) to add features such as fund amount to graph database of transactions.
- Automated a system to categorize any text using an unsupervised model that eliminated the need for manually finding cluster centers or reducing the time to find density parameters.
- Leveraged glove vectors (or Word2Vec) to classify an activity's risk which was extracted from text using NLP and then modeled their impact as a network/graph.
- Constructed statistical frameworks and code by utilizing new machine learning programs; I then presented them at conferences and expos.
- Met with clients and listened to their needs in order to design solutions to those needs.
- Transferred, aggregated, and updated data on approvers of advances, credit cards, purchase orders, payments, and other financial and banking transactions in NoSQL database (MongoDB) using JavaScript, and Python.
- Visualized the above-mentioned data in a Tableau dashboard.
- Collaborated on multiple high-priority projects and made key contributions to the team’s long term strategy meetings.
- Solved problems with a user-friendly explanation of the methodology and with minimal oversight.
Technologies: Tableau, MongoDB, SQL, Neo4j, R, Python