Artur Brugeman, Web Crawlers Developer in Chelyabinsk, Chelyabinsk Oblast, Russia
Artur Brugeman

Web Crawlers Developer in Chelyabinsk, Chelyabinsk Oblast, Russia

Member since August 13, 2014
Artur is a software engineer with a proven ability to develop efficient, robust, and simple back-end solutions for dealing with large data sets. His current focus is on large-scale web crawling and processing of the crawled data.
Artur is now available for hire


    HTTP API, Libcurl, Protobuf, ZeroMQ, MySQL, Go, C++11
    Apache, XML, OSGeo, OpenStreetMap API, Mapnik, OpenStreetMap, Linux, Windows...
    Biometrics, Iris Recognition, Facial Recognition, Image Fingerprinting...



Chelyabinsk, Chelyabinsk Oblast, Russia



Preferred Environment

Git, Valgrind, GDB, Emacs, Linux

The most amazing...

...thing I've built is a web-crawler that scans billions of web pages daily.


  • Software Engineer and Architect

    2014 - PRESENT
    • Built a web-crawler that scans more than 3 billion pages daily, 100x improvement compared to previous system.
    • Implemented highly efficient HTML parser.
    • Built custom robust distributed data storage for inverse link graph, allowing for very high write/read throughput. Currently stores 200 billion unique links found on 30 billion crawled pages from 400 million websites.
    • Built an HTTP API to access link graph database.
    • Built a service to perform real-time ranked suggestions while users input their website url, performs up to 100K qps. Compact prefix arrays with pre-build result sets, multi-threaded reading and lock-free data structures for live updates.
    • Built a rating of web sites based on number of linking IP addresses, with filtering by user input. Compact suffix arrays for fast filtering, multi-threaded reading and lock-free data structures for live updates.
    • Re-built an API to serve reports on web site search engine rankings, implemented asynchronous MYSQL client library to perform concurrent requests to many servers.
    • Built a new storage for website search engine rankings, scaling up to 10 billion records with monthly history.
    • Built a service to aggregate various counters from web-crawler nodes, storing data on 10 billion domain names.
    • Built an HTTP API to access search engine rankings database.
    • All of the above were implemented with a focus on cost-efficiency: the HW costs to support all the mentioned services were negligible.
    Technologies: HTTP API, Libcurl, Protobuf, ZeroMQ, MySQL, Go, C++11
  • Software Engineer

    2013 - 2014
    • Created a high quality rendering back-end for a 3D layer of a world map, using OpenGL, NVIDIA Iray, NVIDIA Optix.
    • Created a tool to preview 3D models and position them on a world map, using OpenGL and Qt.
    • Made supplementary tools to work with the OpenStreetMap data format (C++, XML).
    • Set up a 2D world map rendering back-end using Mapnik, pgSQL, mod_tile, Apache, and MapQuest styles.
    • Extended a PROJ.4 library to add support for isometric projection into Mapnik and more.
    Technologies: Apache, XML, OSGeo, OpenStreetMap API, Mapnik, OpenStreetMap, Linux, Windows, PostgreSQL, MySQL, SQLite, NVIDIA Optix, NVIDIA Iray, OpenGL, Qt, C++, C
  • Software Engineer

    2007 - 2013
    • Created a distributed cache for a proprietary NoSQL database to deliver data closer to processing nodes on big clusters (2013, ZeroMQ, C, C++).
    • Developed a distributed textual search engine for a proprietary NoSQL database, including highly scalable distributed sorting (2012-2013, ZeroMQ, ICU, C, C++).
    • Performed conversion of the national fingerprint database of Turkey from five different data formats. Set up a month-long project for automated processing of data on a 300 node cluster (2013, C, Oracle, MS SQL, XML, PHP).
    • Created a decision engine for the criminal division of the biometric passport system of Uzbekistan, integrating three biometric systems into a single solution (2011-2012, Oracle, C, distributed transactions).
    • Refactored a back-end component that aggregated fingerprint search results coming from cluster nodes. Reduced processing time by 10x. Documented the algorithms and wrote the code to be maintainable (2012, C).
    • Created a GUI application to manage data distribution between cluster nodes (2012, C++, Qt).
    • Created a GUI application to control AFIS search processes distributed between cluster nodes. (2012, C++, Qt).
    • Created a web application to train, test, and examine students studying the AFIS system (2010-2011, PHP, MySQL, Apache, JavaScript, jQuery).
    • Built a library to perform a multi-criterial evaluation of AFIS search results (2010, C).
    • Designed a database to perform multi-dimensional analysis of daily logs coming from hundreds of AFIS installations. Built an automatic process to import logs into the database (2009, MySQL, C).
    • Created an automated system to collect, extract, translate, and maintain documentation of a ~3M LOC codebase (2009, Doxygen, XSLT, XML, PHP, HTML, JavaScript, Apache).
    • Added support for customizable forms into a textual CRUD application. Refactored the code. Used static analysis to find and remove obsolete code paths (2008, C).
    • Created a viewer for various formats of biometric data, including NIST, EFTS, Papillon, and Interpol (2007, C++, Qt).
    Technologies: Biometrics, Iris Recognition, Facial Recognition, Image Fingerprinting, Image Processing, NIST, Berkeley DB, Microsoft SQL Server, MySQL, Oracle, jQuery, JavaScript, HTML, Doxygen, XSLT, XML, PHP, ZeroMQ, Networking, Multithreading, POSIX, Qt, Linux, C++, C


  • Sobnik

    Sobnik is an extension for Chrome browser that detects brokers on Russian real estate bulletin boards. It is actively used by thousands of users. Built using Chrome API, JavaScript, Golang, MongoDB, ZeroMQ, image processing.


  • Languages

    C++, C, XML, XSLT, HTML, C++11, PHP, Go, SQL-99, JavaScript
  • Libraries/APIs

    HTTP API, Protobuf, Libcurl, ZeroMQ, OpenGL, OpenStreetMap API, POSIX, jQuery, Node.js, Gmail API, Twitter API, FFmpeg
  • Other

    Web Crawlers, Multithreading, Networking, Image Processing, TCP/IP, Non-blocking I/O, Proxy Servers, WebSockets, Arbitrage, NVIDIA Iray, NVIDIA Optix, OpenStreetMap, Mapnik, Doxygen, NIST, Image Fingerprinting, Facial Recognition, Iris Recognition, Biometrics, Lightning Network, OAuth, Bitcoind, P2P, ML Kit, Network Programming
  • Frameworks

    Qt, AngularJS
  • Tools

    V8, Emacs, OSGeo, Apache, Git, Valgrind, GDB
  • Platforms

    Linux, Android, Oracle, Windows
  • Storage

    SQLite, MySQL, Berkeley DB, MongoDB, PostgreSQL, Microsoft SQL Server
  • Paradigms

    Concurrent Programming


  • Bachelor's Degree in Information Technology
    2003 - 2008
    South Ural State University - Chelyabinsk, Russia

To view more profiles

Join Toptal
Share it with others