Dan Lecocq

Dan Lecocq

Seattle, United States
Hire Dan
Scroll To View More
Dan Lecocq

Dan Lecocq

Seattle, United States
Member since May 6, 2014
Dan is an engineer and cowboy coder with a background in big data and distributed systems. He has extensive experience with profiling, optimization, asynchronous network I/O, and getting huge amounts of work pushed through a pipeline reliably and efficiently.
Dan is now available for hire
Portfolio
  • Moz
    Python, C++, Ruby, Java, Elasticsearch, HBase, qless, NSQ, gevent
  • IBM Research
    Python, WebSockets, C++
Experience
  • Distributed Programming, 7 years
  • Python, 5 years
  • Concurrent Programming, 4 years
  • C++, 3 years
  • Test-driven Development (TDD), 2 years
  • HBase, 1 year
Seattle, United States
Availability
Part-time
Preferred Environment
Linux, Git, Python, C++, Ruby, JavaScript
The most amazing...
...thing I've coded is a system to crawl and index hundreds of millions of tweeted URLs within 10 minutes of being tweeted.
Employment
  • Senior Software Engineer
    Moz
    2011 - PRESENT
    • Rewrote a service recursively crawling customer sites and analyzing and reporting SEO issues.
    • Wrote a queueing system (qless) that has been widely adopted internally for externally for production systems.
    • Designed and implemented a service for crawling and indexing pages discovered through important RSS feeds.
    • Helped to implement an algorithm to remove navigation, headers, and footers from web content for the purposes of indexing (eventually published).
    • Wrote a number of web crawlers for different purposes, contributing many well-used open source projects along the way to the state of the art of web crawling.
    • Crawled and processed tens of billions of pages across all my various crawlers.
    • Worked to support our next generation of backlinks indexing infrastructure.
    Technologies: Python, C++, Ruby, Java, Elasticsearch, HBase, qless, NSQ, gevent
  • Graduate Researcher
    IBM Research
    2010 - 2010
    • Collaborated between KAUST's supercomputing department and IBM Research.
    • Augmented a computational steering library to work with WebSockets.
    • Included work with Lawrence Berkeley National Lab to eventually support streaming visualization.
    • Targeted KAUST's supercomputing infrastructure, an IBM BlueGene/P.
    • Worked to enable researchers to examine, monitor, and update parameters of running simulations.
    Technologies: Python, WebSockets, C++
Experience
  • Shovel (Development)
    https://github.com/seomoz/shovel

    Simple command-line dispatch of Python functions. Users find themselves regularly wanting to invoke small, simple Python functions from the command line, so I wrote what has become one of Moz's most popular repos.

  • qless (Development)
    https://github.com/seomoz/qless

    A rich queueing system for Redis, used for production services both at Moz and elsewhere. It utilizes Redis's Lua script support to implement complex atomic operations for queueing. It consists of a Lua core (https://github.com/seomoz/qless-core) and Ruby (https://github.com/seomoz/qless) and Python (https://github.com/seomoz/qless-py) bindings.

  • simhash-py (Development)
    https://github.com/seomoz/simhash-py

    Fast simhash in Python. It supports maintaining and finding near-duplicates in a set of documents with extreme speed. It consists of our underlying library simhash-cpp (https://github.com/seomoz/simhash-cpp) and the surrounding Python bindings.

  • pyreBloom (Development)
    https://github.com/seomoz/pyreBloom

    Extremely fast bloom filter manipulations in a Redis instance. While Redis itself does all the persistence, this library implements a highly-efficient Python C extension.

  • dragnet (Development)
    https://github.com/seomoz/dragnet

    Web page content extraction. This is the implementation supporting some published work (http://dl.acm.org/citation.cfm?id=2487828) where we separate the main content of web page articles and blog posts from the other components (navigation, headers, footers, etc.).

Skills
  • Languages
    Python, JavaScript, C++, Lua, Ruby
  • Paradigms
    Concurrent Programming, Distributed Programming, Test-driven Development (TDD)
  • Platforms
    Linux
  • Storage
    Redis, Amazon S3, Elasticsearch, HBase, MySQL
  • Misc
    Open Source
  • Libraries/APIs
    jQuery
Education
  • Master's degree in Applied Mathematics and Computational Science
    King Abdullah University of Science and Technology - Thuwal, Saudi Arabia
    2009 - 2010
  • Bachelor's degree in Computer Science
    Colorado School of Mines - Golden, CO
    2004 - 2009
I really like this profile
Share it with others