Pawel Kaplanski, Machine Learning Developer in Sydney, New South Wales, Australia
Pawel Kaplanski

Machine Learning Developer in Sydney, New South Wales, Australia

Member since January 31, 2019
Pawel is an experienced data-scientists and machine learning professional. He has worked for Fortune 100 companies, and he has an academic background in the field. Before moving to data science, he was a former lead architect in Samsung R&D Center. Pawel holds a Ph.D. in knowledge representation and reasoning as well as a master's degree and a bachelor of science degree in computer science.
Pawel is now available for hire

Portfolio

Experience

Location

Sydney, New South Wales, Australia

Availability

Part-time

Preferred Environment

Python

The most amazing...

...thing I've coded is a Clinical Decisions Support System implementing ESMO guideline for cancer treatment.

Employment

  • Senior Machine Learning Engineer

    2019 - PRESENT
    Undisclosed
    • Recommended systems, image processing, NLP, and deep learning to the production.
    Technologies: PyTorch, Python, TensorFlow
  • Data Scientist

    2011 - PRESENT
    Cognitum
    • Created machine-learning models using Sklearn and Tensorflow for Fortune 100 customer in the area of trade promotion optimization.
    • Created a cognitive programming language that makes AI programming easy allowing mixing reasoning with machine learning, used in a fraud detection system for a public institution.
    • Designed and implemented controlled natural language for formalizing the knowledge around lung cancer, used by the oncologist to formalize ESMO guidelines.
    • Created affective-computing AI models that are combining both expert knowledge and their intuitions, to calculate the quality score of complex decisions.
    • Created the novel, automated user interface synthesis algorithm in which a set of requirements is automatically translated into a working application, currently used by 30+ clinical centers and biggest telecon in Australia.
    • Created an NLP classification algorithm for legal documents corpora based on the NLTK library, constructed using mixed feature-extraction techniques: POS-Tagging, noun-phrase extraction, collocations and NER (named entity recognition), followed by Tf/Idf, feature reduction and finally the classification with Passive-Aggressive, scalable classifier.
    • Created a critical part of a tax-fraud detection system was based on natural language rules enabling decision makers and specialists to manage a tax fraud knowledge base. The stream-based reasoner allows discovering fraudulent activities in the stream of 5 million invoices per day.
    Technologies: Apache Jena, SKOS, SPARQL, Semantic Web Rule Language (SWRL), RDF, OWL, BPMN, TensorFlow, NumPy, Scikit-learn, NLTK, R, Python
  • Assistant Professor

    2013 - 2017
    Gdansk University Of Technology, Department of Applied Informatics in Management
    • Reviewed “Government Information Quarterly, An International Journal of Information Technology Management, Policies, and Practices," IF=2.515, 5Y IF=3.161.
    • Acted as an academic visitor at the University of Newcastle, Australia.
    • Participated as a member of the EU Maria-Courie research project "Smart multipurpose knowledge administration environment for intelligent decision support systems development."
    • Reviewed and contributed to the “18th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems."
    • Served as a member of the international BRIDGE project: "CDSS for Oncology."
    • Taught the following classes: R Programming, Introduction to DataScience, Business Intelligence and BigData Processing, Software Development Process Methodology and Tools.
    Technologies: Python, R
  • Lead Architect

    2006 - 2011
    Samsung
    • Led design and implementation of an industrial software stack for digital television receivers.
    • Led design and implementation of a set-top-box device emulator for efficient application level testing purposes.
    • Designed and implemented automated smoked test system with ASP.Net, MSMQ, image recognition, and remote controller emulation.
    • Technically managed a team of 30+ programmers.
    • Conducted training for newcomers about advanced multithreaded design patterns in C++.
    Technologies: Embedded Systems, C++

Experience

  • CDSS - Clinical Decision Supporting System
    https://2016.semantics.cc/projects/registry-clinical-data-ptcho-%E2%80%93-rectal-cancer-module

    Clinical registers are needed to perform research studies and thus to increase medical knowledge that finds its way into new and improved guidelines. Adherence to clinical practice guidelines is mandatory to increase the effectiveness of treatments and to eliminate the negative consequences of medical decisions.

    We organized available data into the knowledge of the diagnostic process, based on many sources like studies, publications, recommendations, so it supports doctors decisions. We also developed a central registry for collecting patient’s clinical data from over 70 oncological institutions in Poland. In production since 2016.

    The results were published in Expert Systems With Applications that is currently ranked number 1 in the Google Scholar h-index listed under the top publications of artificial intelligence.

  • Trade Promotion Optimization

    Sales analysts are responsible for providing the promotion plan for the new quarter in most of the big FMCG enterprises. Currently, these plans are created manually, mostly using conventional tools like Excel that try to answer typical TPO (trade promotion optimization) questions like:
    - Can we lower overall costs by optimizing products volume sales and its promotion strategy by anticipating a promotion calendar for a given period?
    - Can we predict using key indicators when and which sales pattern is the most effective and can be used to increase volume sales?
    - Can we set up a useful promotion calendar for “slow-moving products”?
    - Can we optimize budget KPIs when planning the next sales period?

    In our case, the mis-forecasting (avg. the error was around 20%) led to budget reduction (across multiple stages within a whole supply chain). To solve the problem, we combined business knowledge of subject matter experts with historical sales data that we received. We also took into account their anomalies and outliers.

    The solution allowed the company to increase its accuracy in prediction by up to 10% of volume planning.

  • Tax-fraud Detection on VAT

    The tax-fraud detection system was based on natural language rules enabling decision makers and specialists to manage a tax fraud knowledge base. Reasoning with AI agents is used to recognize elaborate fraud and non-compliance patterns. A stream-based reasoner allows discovering fraudulent activities in the stream of 5 million invoices per day.

  • Automated Decision Making System

    In order to sign a contract, the CEO has to analyze business situations and implement a good strategy, especially about $10+ million contracts. CEO makes highly contextual and time-sensitive decisions that have to factor in priorities, such as risk aversion or profitability. To amplify the gut feelings of the CEO, we have developed the automated decision-making system. The core of the system is based on the effective computing AI models, which are adapted to combine both expert knowledge and intuitions, in order to calculate the quality score of the deal/opportunity. In order to solve a complex problem in business, a manager needs to take into account multiple, conflicting objectives and we observed that the solution must consist of the AI models wrapped in the user-friendly UI, with drag and drop editor for tuning the expert knowledge consumed by the models. Having this done, a visualization of the results can be finally presented on a custom dashboard to the CEO.

  • Abusive-clause detector

    Processing of large corpora of legal documents, for finding potentially abusive clauses is very resource-prone and usually requires hiring a team of lawyers. Reduction in this process can be achieved using modern NLP methods. I developed a Python classification based on the NLTK library, that was capable of automating the daily work of the client. The pipeline was constructed using classical approach, based on feature-extraction techniques like n-grams, POS-Tagging, noun-phrase extraction, collocations and NER (named entity recognition), followed by Tf/Idf, feature reduction and finally the classification with Passive-Aggressive, scalable classifier.

  • Cyber Assessment

    Developed a tool, that is allowing encoding the knowledge of cyber-security expert for a cyber-security solution provider of a strong, policy-driven security architecture that can be enforced through defined security domains and controls, and implemented through set standards, guidelines and procedures.

    The tool is allowing customers to perform guided cyber-security health check, and after the health-check is completed, the detailed report (diagnosis) is generated allowing the customer to understand the current state of the company’s cybersecurity maturity level and understand the weak points. The estimation of the potential cost of the Problem is also provided.

Skills

  • Languages

    OWL, RDF, SPARQL, R, SQL, C++, Java, C#, Python, Semantic Web Rule Language (SWRL), JavaScript, T-SQL, UML, XML
  • Frameworks

    Apache Jena, Ontology Framework, TinkerPop, .NET
  • Libraries/APIs

    NLTK, OWL API, TensorFlow, Scikit-learn, Keras, NumPy, Pandas, PyTorch, PySpark, SymPy, SciPy
  • Tools

    Protégé, SikuliX, Microsoft Visual Studio, Git, Jira, OpenLink Virtuoso, Apache Solr
  • Paradigms

    Data Science, Anomaly Detection, BPMN, Scrum
  • Other

    WordNet, Genetic Algorithms, Natural Language Processing (NLP), Machine Learning, Recurrent Neural Networks, Deep Learning, Classification Algorithms, Regression Modeling, Clustering Algorithms, Bayesian Inference & Modeling, Logistic Regression, Decision Trees, Random Forests, Markov Model, Ensemble Methods, Evolutionary Algorithms, Sesame, Data Visualization, Scalable Architecture, Time Series Analysis, Principal Component Analysis (PCA), SKOS, Embedded Systems, Schema.org
  • Platforms

    Azure, AWS EC2, Jupyter Notebook, Amazon Web Services (AWS), RStudio
  • Storage

    Cassandra, Titan Graph, Oracle SQL, MySQL

Education

  • Ph.D. in Computer Science
    2009 - 2013
    Gdansk University of Technology - Gdańsk, Poland
  • Master of Engineering degree in Computer Science
    1999 - 2001
    Wroclaw University of Technology - Wrocław, Poland
  • Bachelor of Engineering degree in Computer Science
    1996 - 1999
    Wroclaw University of Technology - Wrocław, Poland

Certifications

  • Sequence Models
    FEBRUARY 2018 - PRESENT
    Coursera
  • Deep Learning Specialization
    FEBRUARY 2018 - PRESENT
    Coursera
  • Convolutional Neural Networks
    OCTOBER 2017 - PRESENT
    Coursera
  • Structuring Machine Learning Projects
    SEPTEMBER 2017 - PRESENT
    Coursera
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
    SEPTEMBER 2017 - PRESENT
    Coursera
  • Neural Networks and Deep Learning
    AUGUST 2017 - PRESENT
    Coursera
  • Oracle Certified Professional, Java SE 5 Programmer
    FEBRUARY 2011 - PRESENT
    Oracle

To view more profiles

Join Toptal
Share it with others