Senior Data Scientist
2020 - 2021eka.care- Developed a module that extracts relevant information from medical documents such as prescriptions, pathology lab reports, and vaccination certificates and makes them digitally available and searchable.
- Used LayoutLM model to exploit position and to extract the key terms in medical documents.
- Developed end-to-end pipeline from uploading documents to entity extraction, including document classification and manual data annotation steps on AWS ecosystem.
- Collaborated on designing medically relevant hierarchies for different medical conditions and symptoms using SNOMED CT, which helped provide contextual options to doctors in their prescription pad.
Technologies: Python, AWS, Amazon Web Services (AWS), Deep Learning, Machine Learning, Data Science, AWS S3, Amazon Athena, Jupyter Notebook, Data AnalystData Scientist
2020 - 2020MYRM Technologies, LLC- De-duplicated and cross-referenced customer records to be inserted from a disorganized collection of spreadsheets into the Salesforce system.
- Designed a database used to migrate Salesforce data to a RoR based system.
- Led import from various sources into the Salesforce system for efficient tracking of leads and progression to different stages of deal completion.
Technologies: Pandas, Salesforce, Matching Systems, Jupyter Notebook, AWS Athena, Data AnalystLead Data Scientist
2017 - 2020MakeMyTrip- Developed a hotel-ranking model that used a user's recent interactions to show relevant results.
- Built a user intent prediction model based on a customer's activity in the eCommerce funnel.
- Constructed the NLP part of a chatbot for handling the post-sales requirements of the business.
- Collaborated on the design of a feature marketplace—a kind of data warehouse that combined data from several sources for use by data science models.
- Created a universal search for the travel domain which allowed users to search for hotels and flights using free text. This involved the application of NLP techniques to extract relevant fields from the text.
Technologies: Amazon SageMaker, PyTorch, Amazon Web Services (AWS), PySpark, Data Science, NumPy, Pandas, Apache Airflow, Redshift, Spark, Natural Language Processing (NLP), Machine Learning, Python, Artificial Intelligence (AI), Algorithms, Data Analysis, AWS S3, NoSQL, Amazon Athena, Jupyter Notebook, AWS Athena, Data AnalystData Scientist | Analyst
2019 - 2019Mix Tech (via Toptal)- Set up various dashboards over Redshift and Metabase to understand how the product was performing among different customer segments and devices.
- Analyzed customer data and monitor stats like user retention, app installation/uninstallation rates, user engagement, daily/weekly/monthly/quarterly performance, and customer movement through the funnel, etc.
- Developed a churn model using PySpark and Python which was used to target customers based on their probability of churn.
Technologies: Amazon Web Services (AWS), Data Analytics, Spark, Machine Learning, Metabase, SQL, Redshift, Python, Data Analysis, Data Modeling, AWS S3, Amazon Athena, Jupyter Notebook, AWS Athena, Data AnalystResearch Assistant
2015 - 2017Universitat Pompeu Fabra- Developed a model linking household wealth to female infanticide in India through the marriage market.
- Estimated the structural model and conducted counterfactual policy simulations to inform interventions. Implementation using Amazon Web Services (AWS) for the heavy computational tasks.
- Developed theoretical solutions of the model with derivation of the equilibrium equations and checking the proofs. Simulated the model economy in Matlab.
Technologies: Mathematica, MATLAB, Python, Economics, Data Modeling