Daniel O'Huiginn, Data Engineering Developer in Berlin, Germany
Daniel O'Huiginn

Data Engineering Developer in Berlin, Germany

Member since March 20, 2017
Daniel likes code, words, and data. Starting as a Python developer, he's moved gradually from web back-end work to more data-driven projects. After spending time working in classic big data and data science, he found a niche in investigative data journalism—learning skills he now likes to use more commercially.
Daniel is now available for hire

Portfolio

Experience

Location

Berlin, Germany

Availability

Full-time

Preferred Environment

Git, Linux, IntelliJ

The most amazing...

...tool I've built has been used to expose corruption in Azerbaijan and Uzbekistan—to find $300 million of undeclared offshore assets and to enable prosecutions.

Employment

  • Senior Engineer

    2021 - PRESENT
    Grata
    • Built an internal API to extract and geolocate addresses from web pages.
    • Worked remotely within a large, established team of 20+ developers, collaborating via Jira, GitHub, and Slack.
    • Mentored and onboarded a junior developer via code review, pair programming, and general advice.
    Technologies: Python, Containers, Docker, Elasticsearch, Pelias, Pytest, Kubernetes, CI/CD Pipelines, GIS
  • Geospatial Developer

    2018 - 2019
    Telecommunications Industry Client
    • Built a tool to plan antenna locations for telecoms.
    • Combined aerial imaging with government and open data to generate special-purpose maps.
    • Enabled a web application backed by terabytes of source data by optimizing the entire data pipeline: Linux server admin, PostGIS database, Python data-processing, web back end, JavaScript front end, and data visualization.
    Technologies: PostGIS, Django, PostgreSQL, Geospatial Data, GIS, Angular, LiDAR, Open Data, Databases, Pytest
  • Machine Learning Developer

    2017 - 2017
    Freelance Work (Independent Contract Work)
    • Built a natural language processing (NLP) system to match free-form text queries to appropriate product offers.
    • Created a search tool using Elasticsearch, integrated with NLP tools.
    • Developed an API to enable integration with other systems.
    Technologies: Natural Language Processing (NLP), Machine Learning, Text Generation, SpaCy, Elasticsearch, Scikit-learn, NumPy, APIs, Pandas
  • Lead Developer

    2015 - 2017
    OpenOil
    • Built a database of corporate filings from the energy and mining industries. My full-stack responsibility: web front end and back end, data engineering and ETL, DB administration, and DevOps.
    • Supported the financial modeling through data provision.
    • Created data visualizations combining financial, geographical, and qualitative data.
    • Implemented a data analysis using Linux Shell tools.
    Technologies: Amazon Web Services (AWS), Sed, AWK, Bash, GitHub Pages, jQuery, Statistics, Microsoft Excel, Amazon EC2, Amazon S3 (AWS S3), Flask, JavaScript, AngularJS, OCR, Celery, PostgreSQL, Python, Docker, Elasticsearch, Databases
  • Developer | Data Engineer

    2006 - 2017
    Freelance Work (Independent Contract Work)
    • Used natural language processing to extract treatment histories from medical correspondence.
    • Implemented automatic clustering of Russian-language news articles for an academic research project.
    • Led the development of an online film distribution platform and scaled it to handle 500+ requests per second.
    • Administered to servers for web and data-analysis workflows, including Docker and up to 40 servers.
    • Rewrote python code as PHP, and maintained PHP code for web services and data scraping.
    • Devops and system administration of linux servers.
    • Full-stack development of a social media aggregation website.
    • Image processing for a financial-industry client.
    • Smaller web development projects using django, wordpress, javascript, jQuery, drupal, pylons, turbogears.
    • Wrote content including technical documentation, website copy, articles on cultural issues, French-English translations.
    Technologies: CSS, Pandas, NumPy, NLTK, MySQL, Memcached, NGINX, Apache, Drupal, PHP, Django, Flask, NoSQL, PostgreSQL, JavaScript, HTML, Linux, Python, Databases
  • Developer

    2013 - 2014
    Organized Crime and Corruption Reporting Project
    • Helped a world-class team of investigative journalists to use technology in their work, such as data analysis, data journalism, security, and training.
    • Researched several stories with substantial international impact.
    • Acted as the project manager and lead developer for a research service for investigative journalists.
    • Built a Django website rapidly for an extensive leaked database.
    Technologies: CSS, Google App Engine, PostgreSQL, Elasticsearch, Django, HTML, Python, Project Management
  • Senior Developer

    2011 - 2012
    Zugo Services
    • Data engineering, using a MapReduce system to collect and process terabytes of data.
    • Scaled a data ingestion pipeline (MongoDB, Nginx) to handle write loads of 1,000+ requests per second.
    • Used statistics and machine learning to generate insight from big data and to forecast customer behavior.
    • Responsible for reliability of a system with over 1 million users.
    • Worked on a browser extension.
    • Worked in an agile team, using Agile/Scrum, test-driven development, code review.
    Technologies: Erlang, JavaScript, NumPy, Scikit-learn, R, MongoDB, MapReduce, Big Data, Machine Learning

Experience

  • Investigative Dashboard
    https://investigativedashboard.org/

    Investigative Dashboard is a tool that helps investigative journalists use public records to research their stories. It combines a document database and search system with a research help-desk. I led the development in 2013 to 2014.

  • Open Data Tour of Tanzania
    http://tanzania.openoil.net

    A showcase of data-driven work on the energy industry: geodata, financial modelling, and mapping of corporate structures.

Skills

  • Languages

    JavaScript, Python 3, Python 2, Python, SQL, Curl Language, ECMAScript (ES6), Bash Script, HTML, Bourne Shell, Bash, HTML5, AWK, Sed, Erlang, CSS, PHP 7, PHP 5, CSS3, Sass, PHP, Ruby, Java, Go, R
  • Frameworks

    Flask, Django, Nose, Pylons, TurboGears, AngularJS, Bootstrap, Angular
  • Libraries/APIs

    Beautiful Soup, Pandas, NumPy, Node.js, JSONP, REST APIs, SQLAlchemy, Scikit-learn, NLTK, Python Asyncio, Stanford NLP, SpaCy, PyTorch, TensorFlow, Keras, AMQP, FFmpeg, Google Maps API, Fabric, jQuery, Django ORM, SciPy
  • Tools

    Git, Shell, *nux Shells, Docker Swarm, Docker Compose, cURL Command Line Tool, NGINX, Emacs, GitHub Pages, Pytest, Jupyter, Celery, Logging, uWSGI, GIS, Google Sheets, Amazon Simple Queue Service (SQS), SPSS, RabbitMQ, GitHub, Microsoft Excel, Apache, NPM, Jira, IntelliJ
  • Paradigms

    DevOps, Data Science, Agile, Test-driven Development (TDD), REST, Microservices, MapReduce, RESTful Development, Continuous Integration (CI)
  • Platforms

    Amazon Web Services (AWS), Debian, Linux, Ubuntu, Amazon EC2, Google App Engine, Docker, Jupyter Notebook, Apache2, WordPress, CentOS, Mapbox, Drupal, Red Hat Linux, Kubernetes
  • Storage

    JSON, RDBMS, NoSQL, PostgreSQL, Databases, MariaDB, MySQL, Amazon S3 (AWS S3), Elasticsearch, Memcached, MongoDB, Neo4j, Redis, Google Cloud, PostGIS
  • Other

    Back-end Development, Machine Learning, Research & Investigation, APIs, Natural Language Processing (NLP), Shell Commands, Scraping, Data Scraping, Web Scraping, lxml, Ubuntu Server, BitTorrent, Text Mining, Back-end, Web Development, Screen Scraping, Data Engineering, Writing & Editing, Data Analytics, Journalism, Regression Modeling, Linear Regression, Algorithms, Full-stack, Data Mining, Unix Shell Scripting, Code Review, Visualization, Information Visualization, Containers, Container Orchestration, Data Architecture, Architecture, Technical Project Management, Software Project Management, Source Code Review, Data Structures, Statistical Modeling, Statistics, Data Visualization, Matrix Algebra, Big Data, Documentation, RESTful Web Services, RESTful Microservices, RESTful Services, Deep Learning, Cython, Chatbots, Mathematics, Data Wrangling, Algebra, Big Data Architecture, Linear Algebra, Bayesian Statistics, SVMs, Forecasting, Scalability, Single-page Applications (SPA), Search, Tornado, SSL Certificates, SSL Configurations, SSL, HTTP, mod_wsgi, Computational Economics, Gunicorn, QGIS, Encryption, OCR, Load Balancers, Geodatabases, Financial Data, HTTPS, TCP/IP, Amazon Route 53, Open Data, Networks, Financial Modeling, System Administration, Geospatial Data, LiDAR, Text Generation, Pelias, CI/CD Pipelines
  • Industry Expertise

    Project Management, Security

Education

  • Bachelor of Arts Degree in Sanskrit and South Asian Studies
    2001 - 2005
    University of Cambridge - Cambridge, UK

To view more profiles

Join Toptal
Share it with others