Reuben Firmin, Software Developer in United States
Reuben Firmin

Software Developer in United States

Member since November 9, 2013
Reuben is an experienced software architect and engineer with significant technical and project management experience. He boasts expertise in big data, data warehousing, and scalable and distributed applications. He excels with Java, relational and NoSQL databases, and web technologies.
Reuben is now available for hire


    MySQL, PostgreSQL, EMR, Redshift, AWS EC2, Django, Python, Java
    Jersey, Guava, Guice, Vertica, Cassandra, PostgreSQL, Java
    RPM, C++, Perl, Apache Lucene, Solr, Java



United States



Preferred Environment

Git, IntelliJ, Linux

The most amazing...

...code I've written cut out all garbage collection from a bulk loader, resulting in a 100x performance gain.


  • Founder

    2013 - PRESENT
    • Founded a consulting and technical recruiting company.
    • Managed a team of 12 freelancers to redesign and rebuild a Perl-based site in Python/Django. Handled architecture, project management, code review, hiring, and bug fixing.
    • Managed a team of 4 freelancers to commercialize the prototype product.
    • Managed a team of freelancers hiring machine learning experts for a financial firm in NYC. Screened all candidates.
    • Led a team of freelancers hiring Ruby engineers for a startup in San Francisco.
    • Handled hands-on, project-based Java development for Catalist & Streamsage.
    Technologies: MySQL, PostgreSQL, EMR, Redshift, AWS EC2, Django, Python, Java
  • Chief Architect

    2010 - 2013
    • Redesigned the data warehouse schema for storing client data in a segmented manner working within Vertica's design limitations and permitting aggregated analytics.
    • Led a team of 10 developers (including one manager). Owned five APIs, two web applications, and a back-end task processing/scheduling system.
    • Re-architected and led the relaunch of a flagship web application. Rewrote the app using GWT, Spring, and MyBatis. Achieved the core goal of project: capturing commonalities in data structures among thousands of user supplied files so that analytics could be performed across aggregate and diverse datasets. Focused on intuitive UX, performance, code quality, and simplicity of maintenance (all of which were lacking in the prior iteration).
    • Mentored junior developers. Conducted weekly one-on-one sessions with all developers. Ran a weekly code review session to bolster confidence among developers by frequently discussing code, sharing knowledge, and improving code quality.
    • Conducted dozens of interviews, hired several developers (and one manager.)
    Technologies: Jersey, Guava, Guice, Vertica, Cassandra, PostgreSQL, Java
  • Senior Software Engineer (Contractor)

    2010 - 2010
    • Led the development of a building and deployment system.
    • Led bug fixing efforts prior to the 1.0 release. Used TCPDUMP and Wireshark to isolate a performance problem (an interaction between Hibernate and Solr). Rewrote commercial cutting code for using an external tool rather than MythTV metadata. Isolated race conditions, refactored threading code for using Futures. Profiled web application code and tuned hot spots remotely.
    • Refactored major sections of the research department's code to make it production ready. Parsed a closed caption file and fixed time anomalies. Parsed output from a forced alignment library and filled in times between matches. Ran a series of natural language processing tools to separate a given video into topic specific segments.
    • Ported the UPenn Super Tagger from C++ into Java. Wrote code to take output from the POS tagger and apply higher level rules to determine sentence-level grammar.
    • Wrote scripts to support the transformation of 21 years of New York Times text into a 30GB table of co-referring words. Optimized the web app providing API access to this table to allow for batch lookups of a given corpus of words.
    Technologies: RPM, C++, Perl, Apache Lucene, Solr, Java
  • Engineering Manager

    2007 - 2010
    • Managed a team of 15 developers. Managed a 3-person sysadmin team. Conducted hundreds of interviews.
    • Handled key areas of product strategy. Made major technical contributions to several successful grant applications including a $32 million OSEP award, $50k Mozilla Foundation award, $200k Newcomb award, and $100k TRACE award.
    • Architected and led the relaunch of using a technology stack of Java 6, Postgres, Solr/Lucene, Ibatis, Spring, and FreeMarker. Designed the server layout on the managed host. Conducted requirements interviews with key parties. Created normalized and efficient database schema. Designed many key code subsystems. Implemented the site to handle a million logins and 800k book downloads during 2009 with five nines of uptime.
    • Architected a database-backed distributed file system now storing 3 million files in 3 distinct locations over 6 cross-linked NFS file systems. Wrote DFSck to restore the given file system in a new location and check the validity of the whole DFS.
    • Wrote ebook converters (chapter finding algorithms, copyright and ISBN discovery, remote metadata lookup), a task scheduling subsystem/application, a Solr/Lucene search prototype, a DAO layer framework, an Ehcache-based caching system, and a duck typing proxy.
    Technologies: EhCache, PostgreSQL, Apache Lucene, Solr, Java
  • Lead Software Engineer

    2006 - 2007
    • Worked as principal engineer in the Web Development / R&D team, developing products to enhance a real-time auction engine.
    • Architected and built the new platform for a website rewrite. Based it on an Freemarker / Spring & Ehcache / Ibatis stack for performance, flexibility, and scalability. Deployed it on 80 Resin Linux web servers.
    • Designed a build and deploy system.
    • Completed a proof of concept project evaluating different view technologies, caching models, and MVC frameworks. Considered XML/XSTL rendering both within the browser (AJAX using Protoype), and on the server (using XStream, DOM4J). Used Freemarker, Velocity, JSP, Struts, Spring, and JSF. Implemented caching of datasets in the browser, on the application server, and in the database. Ran load testing against tiered deployments of the prototype using Siege. Used YourKit for profiling and performance improvement.
    • Introduced best practices and relevant technologies. Introduced Spring as a replacement to Struts for flexibility and improved testability. Introduced Ibatis over raw JDBC calls.
    Technologies: EhCache, IBM DB2, XSLT, XML, Apache Struts, Spring, Java
  • Senior Software Engineer

    2004 - 2006
    • Worked as senior engineer in the Platform/Tools team.
    • Designed and implemented a dynamic DAO and the corresponding database schema for a new CMS platform. Used a reflection and InvocationHandler combination to build dynamic proxies into the DAO layer to provide a simplified API to application developers.
    • Co-architected and implemented a release engineering tool. Managed the entire release process of the large-scale project at CNET. Allowed the automated deployment of 30+ tools to differing sets of up to 50 application servers. Released patch-sets driven by the CVS tag. Built in a workflow process. Included rollback, server locking, audit trail, multi-threaded deployment (threading built with Java 5 concurrency classes), independent server and web application modules communicating via RMI, AJAX interface to CVS source tree, and dynamic deployment progress interface (using DWR). Built the DAO using Ibatis, used Spring for MVC and container, and designed the database schema that was implemented on Sybase. Built high-level transactional support using Spring/AOP. Evangelized the use of clean CSS standards as part of the project.
    • Created a custom solution modifying the CNET build process to allow build time dependency resolution using existing Ant build scripts. Allowed retrieval of build-time artifact dependencies from CruiseControl producing RPMs of various applications, stored in a central repository from a local Jar repository and/or from remote third party Maven repositories. Provided the ability to produce and obtain snapshot versions. Avoided using Maven due to CNET's large set of Ant-based build scripts.
    • Implemented various layers within the content metadata management web application. Used metadata to define new types of content.
    Technologies: Java


  • Languages

    Java, Perl, C++, XML, XSLT, Python
  • Tools

    IntelliJ IDEA, YourKit, Solr, RPM, IntelliJ, Git
  • Paradigms

    Agile Software Development, Distributed Programming, Concurrent Programming, Functional Programming
  • Storage

    PostgreSQL, Vertica, Cassandra, Redshift, MySQL, IBM DB2
  • Other

    Big Data Architecture, User Experience (UX), EMR, EhCache
  • Platforms

    Linux, AWS EC2
  • Frameworks

    Django, Guice, Jersey, Spring, Apache Struts
  • Libraries/APIs

    Guava, Apache Lucene, jQuery


  • B.Sc. degree in Computer Science
    1995 - 1999
    University of Aberdeen - Scotland

To view more profiles

Join Toptal
Share it with others