Artur Brugeman, Developer in Chelyabinsk, Chelyabinsk Oblast, Russia
Artur is available for hire
Hire Artur

Artur Brugeman

Verified Expert  in Engineering

Web Crawlers Developer

Chelyabinsk, Chelyabinsk Oblast, Russia

Toptal member since August 13, 2014

Bio

Artur is a software engineer with a proven ability to develop efficient, robust, and simple back-end solutions for dealing with large data sets. His current focus is on large-scale web crawling and processing of the crawled data.

Portfolio

MegaIndex.com
HTTP API, Libcurl, Protobuf, ZeroMQ, MySQL, Go, C++11
Rustoria.ru
Apache, XML, OSGeo, OpenStreetMap API, Mapnik, OpenStreetMap, Linux, Windows...
Papillon.ru
Biometrics, Facial Recognition, Image Fingerprinting, Image Processing, NIST...

Experience

Availability

Part-time

Preferred Environment

Git, Valgrind, GNU Debugger (GDB), Emacs, Linux

The most amazing...

...thing I've built is a web-crawler that scans billions of web pages daily.

Work Experience

Software Engineer and Architect

2014 - PRESENT
MegaIndex.com
  • Built a web-crawler that scans more than 3 billion pages daily, 100x improvement compared to previous system.
  • Implemented highly efficient HTML parser.
  • Built custom robust distributed data storage for inverse link graph, allowing for very high write/read throughput. Currently stores 200 billion unique links found on 30 billion crawled pages from 400 million websites.
  • Built an HTTP API to access link graph database.
  • Built a service to perform real-time ranked suggestions while users input their website url, performs up to 100K qps. Compact prefix arrays with pre-build result sets, multi-threaded reading and lock-free data structures for live updates.
  • Built a rating of web sites based on number of linking IP addresses, with filtering by user input. Compact suffix arrays for fast filtering, multi-threaded reading and lock-free data structures for live updates.
  • Re-built an API to serve reports on web site search engine rankings, implemented asynchronous MYSQL client library to perform concurrent requests to many servers.
  • Built a new storage for website search engine rankings, scaling up to 10 billion records with monthly history.
  • Built a service to aggregate various counters from web-crawler nodes, storing data on 10 billion domain names.
  • Built an HTTP API to access search engine rankings database.
  • All of the above were implemented with a focus on cost-efficiency: the HW costs to support all the mentioned services were negligible.
Technologies: HTTP API, Libcurl, Protobuf, ZeroMQ, MySQL, Go, C++11

Software Engineer

2013 - 2014
Rustoria.ru
  • Created a high quality rendering back-end for a 3D layer of a world map, using OpenGL, NVIDIA Iray, NVIDIA Optix.
  • Created a tool to preview 3D models and position them on a world map, using OpenGL and Qt.
  • Made supplementary tools to work with the OpenStreetMap data format (C++, XML).
  • Set up a 2D world map rendering back-end using Mapnik, pgSQL, mod_tile, Apache, and MapQuest styles.
  • Extended a PROJ.4 library to add support for isometric projection into Mapnik and more.
Technologies: Apache, XML, OSGeo, OpenStreetMap API, Mapnik, OpenStreetMap, Linux, Windows, PostgreSQL, MySQL, SQLite, NVIDIA OptiX, NVIDIA Iray, OpenGL, Qt, C++, C

Software Engineer

2007 - 2013
Papillon.ru
  • Created a distributed cache for a proprietary NoSQL database to deliver data closer to processing nodes on big clusters (2013, ZeroMQ, C, C++).
  • Developed a distributed textual search engine for a proprietary NoSQL database, including highly scalable distributed sorting (2012-2013, ZeroMQ, ICU, C, C++).
  • Performed conversion of the national fingerprint database of Turkey from five different data formats. Set up a month-long project for automated processing of data on a 300 node cluster (2013, C, Oracle, MS SQL, XML, PHP).
  • Created a decision engine for the criminal division of the biometric passport system of Uzbekistan, integrating three biometric systems into a single solution (2011-2012, Oracle, C, distributed transactions).
  • Refactored a back-end component that aggregated fingerprint search results coming from cluster nodes. Reduced processing time by 10x. Documented the algorithms and wrote the code to be maintainable (2012, C).
  • Created a GUI application to manage data distribution between cluster nodes (2012, C++, Qt).
  • Created a GUI application to control AFIS search processes distributed between cluster nodes. (2012, C++, Qt).
  • Created a web application to train, test, and examine students studying the AFIS system (2010-2011, PHP, MySQL, Apache, JavaScript, jQuery).
  • Built a library to perform a multi-criterial evaluation of AFIS search results (2010, C).
  • Designed a database to perform multi-dimensional analysis of daily logs coming from hundreds of AFIS installations. Built an automatic process to import logs into the database (2009, MySQL, C).
  • Created an automated system to collect, extract, translate, and maintain documentation of a ~3M LOC codebase (2009, Doxygen, XSLT, XML, PHP, HTML, JavaScript, Apache).
  • Added support for customizable forms into a textual CRUD application. Refactored the code. Used static analysis to find and remove obsolete code paths (2008, C).
  • Created a viewer for various formats of biometric data, including NIST, EFTS, Papillon, and Interpol (2007, C++, Qt).
Technologies: Biometrics, Facial Recognition, Image Fingerprinting, Image Processing, NIST, Berkeley DB, Microsoft SQL Server, MySQL, Oracle, jQuery, JavaScript, HTML, Doxygen, XSLT, XML, PHP, ZeroMQ, Networking, Multithreading, POSIX, Qt, Linux, C++, C

Sobnik

Sobnik is an extension for Chrome browser that detects brokers on Russian real estate bulletin boards. It is actively used by thousands of users. Built using Chrome API, JavaScript, Golang, MongoDB, ZeroMQ, image processing.
2003 - 2008

Bachelor's Degree in Information Technology

South Ural State University - Chelyabinsk, Russia

Libraries/APIs

HTTP API, Protobuf, Libcurl, ZeroMQ, OpenGL, OpenStreetMap API, POSIX, jQuery, Node.js, Gmail API, X (formerly Twitter) API, FFmpeg

Tools

V8, Emacs, OSGeo, Apache, Git, Valgrind, GNU Debugger (GDB)

Languages

C++, C, XML, XSLT, HTML, C++11, PHP, Go, SQL-99, JavaScript

Frameworks

Qt, NVIDIA OptiX, AngularJS

Platforms

Linux, Android, Oracle, Windows

Storage

SQLite, MySQL, Berkeley DB, MongoDB, PostgreSQL, Microsoft SQL Server

Paradigms

Concurrent Programming

Other

Web Crawlers, Multithreading, Networking, Image Processing, TCP/IP, Non-blocking I/O, Proxy Servers, WebSockets, Arbitrage, NVIDIA Iray, OpenStreetMap, Mapnik, Doxygen, NIST, Image Fingerprinting, Facial Recognition, Biometrics, Lightning Network, OAuth, Bitcoind, P2P, ML Kit, Network Programming

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring