Dan Lecocq, Software Developer in Seattle, WA, United States
Dan Lecocq

Software Developer in Seattle, WA, United States

Member since May 6, 2014
Dan is an engineer and cowboy coder with a background in big data and distributed systems. He has extensive experience with profiling, optimization, asynchronous network I/O, and getting huge amounts of work pushed through a pipeline reliably and efficiently.
Dan is now available for hire

Portfolio

  • Moz
    Gevent, NSQL, HBase, Elasticsearch, Java, Ruby, C++, Python
  • IBM Research
    C++, WebSockets, Python

Experience

Location

Seattle, WA, United States

Availability

Part-time

Preferred Environment

JavaScript, Ruby, C++, Python, Git, Linux

The most amazing...

...thing I've coded is a system to crawl and index hundreds of millions of tweeted URLs within 10 minutes of being tweeted.

Employment

  • Senior Software Engineer

    2011 - PRESENT
    Moz
    • Rewrote a service recursively crawling customer sites and analyzing and reporting SEO issues.
    • Wrote a queueing system (qless) that has been widely adopted internally for externally for production systems.
    • Designed and implemented a service for crawling and indexing pages discovered through important RSS feeds.
    • Helped to implement an algorithm to remove navigation, headers, and footers from web content for the purposes of indexing (eventually published).
    • Wrote a number of web crawlers for different purposes, contributing many well-used open source projects along the way to the state of the art of web crawling.
    • Crawled and processed tens of billions of pages across all my various crawlers.
    • Worked to support our next generation of backlinks indexing infrastructure.
    Technologies: Gevent, NSQL, HBase, Elasticsearch, Java, Ruby, C++, Python
  • Graduate Researcher

    2010 - 2010
    IBM Research
    • Collaborated between KAUST's supercomputing department and IBM Research.
    • Augmented a computational steering library to work with WebSockets.
    • Included work with Lawrence Berkeley National Lab to eventually support streaming visualization.
    • Targeted KAUST's supercomputing infrastructure, an IBM BlueGene/P.
    • Worked to enable researchers to examine, monitor, and update parameters of running simulations.
    Technologies: C++, WebSockets, Python

Experience

  • Shovel (Development)
    https://github.com/seomoz/shovel

    Simple command-line dispatch of Python functions. Users find themselves regularly wanting to invoke small, simple Python functions from the command line, so I wrote what has become one of Moz's most popular repos.

  • qless (Development)
    https://github.com/seomoz/qless

    A rich queueing system for Redis, used for production services both at Moz and elsewhere. It utilizes Redis's Lua script support to implement complex atomic operations for queueing. It consists of a Lua core (https://github.com/seomoz/qless-core) and Ruby (https://github.com/seomoz/qless) and Python (https://github.com/seomoz/qless-py) bindings.

  • simhash-py (Development)
    https://github.com/seomoz/simhash-py

    Fast simhash in Python. It supports maintaining and finding near-duplicates in a set of documents with extreme speed. It consists of our underlying library simhash-cpp (https://github.com/seomoz/simhash-cpp) and the surrounding Python bindings.

  • pyreBloom (Development)
    https://github.com/seomoz/pyreBloom

    Extremely fast bloom filter manipulations in a Redis instance. While Redis itself does all the persistence, this library implements a highly-efficient Python C extension.

  • dragnet (Development)
    https://github.com/seomoz/dragnet

    Web page content extraction. This is the implementation supporting some published work (http://dl.acm.org/citation.cfm?id=2487828) where we separate the main content of web page articles and blog posts from the other components (navigation, headers, footers, etc.).

Skills

  • Languages

    Python, JavaScript, C++, Java, Lua, Ruby
  • Paradigms

    Distributed Programming, Concurrent Programming, Test-driven Development (TDD)
  • Platforms

    Linux
  • Storage

    AWS S3, Redis, Elasticsearch, HBase, NSQL, MySQL
  • Other

    Open Source, WebSockets
  • Libraries/APIs

    Gevent, jQuery
  • Tools

    Git

Education

  • Master's degree in Applied Mathematics and Computational Science
    2009 - 2010
    King Abdullah University of Science and Technology - Thuwal, Saudi Arabia
  • Bachelor's degree in Computer Science
    2004 - 2009
    Colorado School of Mines - Golden, CO

To view more profiles

Join Toptal
Share it with others