Artur Brugeman
Verified Expert in Engineering
Web Crawlers Developer
Chelyabinsk, Chelyabinsk Oblast, Russia
Toptal member since August 13, 2014
Artur is a software engineer with a proven ability to develop efficient, robust, and simple back-end solutions for dealing with large data sets. His current focus is on large-scale web crawling and processing of the crawled data.
Portfolio
Experience
Availability
Preferred Environment
Git, Valgrind, GNU Debugger (GDB), Emacs, Linux
The most amazing...
...thing I've built is a web-crawler that scans billions of web pages daily.
Work Experience
Software Engineer and Architect
MegaIndex.com
- Built a web-crawler that scans more than 3 billion pages daily, 100x improvement compared to previous system.
- Implemented highly efficient HTML parser.
- Built custom robust distributed data storage for inverse link graph, allowing for very high write/read throughput. Currently stores 200 billion unique links found on 30 billion crawled pages from 400 million websites.
- Built an HTTP API to access link graph database.
- Built a service to perform real-time ranked suggestions while users input their website url, performs up to 100K qps. Compact prefix arrays with pre-build result sets, multi-threaded reading and lock-free data structures for live updates.
- Built a rating of web sites based on number of linking IP addresses, with filtering by user input. Compact suffix arrays for fast filtering, multi-threaded reading and lock-free data structures for live updates.
- Re-built an API to serve reports on web site search engine rankings, implemented asynchronous MYSQL client library to perform concurrent requests to many servers.
- Built a new storage for website search engine rankings, scaling up to 10 billion records with monthly history.
- Built a service to aggregate various counters from web-crawler nodes, storing data on 10 billion domain names.
- Built an HTTP API to access search engine rankings database.
- All of the above were implemented with a focus on cost-efficiency: the HW costs to support all the mentioned services were negligible.
Software Engineer
Rustoria.ru
- Created a high quality rendering back-end for a 3D layer of a world map, using OpenGL, NVIDIA Iray, NVIDIA Optix.
- Created a tool to preview 3D models and position them on a world map, using OpenGL and Qt.
- Made supplementary tools to work with the OpenStreetMap data format (C++, XML).
- Set up a 2D world map rendering back-end using Mapnik, pgSQL, mod_tile, Apache, and MapQuest styles.
- Extended a PROJ.4 library to add support for isometric projection into Mapnik and more.
Software Engineer
Papillon.ru
- Created a distributed cache for a proprietary NoSQL database to deliver data closer to processing nodes on big clusters (2013, ZeroMQ, C, C++).
- Developed a distributed textual search engine for a proprietary NoSQL database, including highly scalable distributed sorting (2012-2013, ZeroMQ, ICU, C, C++).
- Performed conversion of the national fingerprint database of Turkey from five different data formats. Set up a month-long project for automated processing of data on a 300 node cluster (2013, C, Oracle, MS SQL, XML, PHP).
- Created a decision engine for the criminal division of the biometric passport system of Uzbekistan, integrating three biometric systems into a single solution (2011-2012, Oracle, C, distributed transactions).
- Refactored a back-end component that aggregated fingerprint search results coming from cluster nodes. Reduced processing time by 10x. Documented the algorithms and wrote the code to be maintainable (2012, C).
- Created a GUI application to manage data distribution between cluster nodes (2012, C++, Qt).
- Created a GUI application to control AFIS search processes distributed between cluster nodes. (2012, C++, Qt).
- Created a web application to train, test, and examine students studying the AFIS system (2010-2011, PHP, MySQL, Apache, JavaScript, jQuery).
- Built a library to perform a multi-criterial evaluation of AFIS search results (2010, C).
- Designed a database to perform multi-dimensional analysis of daily logs coming from hundreds of AFIS installations. Built an automatic process to import logs into the database (2009, MySQL, C).
- Created an automated system to collect, extract, translate, and maintain documentation of a ~3M LOC codebase (2009, Doxygen, XSLT, XML, PHP, HTML, JavaScript, Apache).
- Added support for customizable forms into a textual CRUD application. Refactored the code. Used static analysis to find and remove obsolete code paths (2008, C).
- Created a viewer for various formats of biometric data, including NIST, EFTS, Papillon, and Interpol (2007, C++, Qt).
Experience
Sobnik
Education
Bachelor's Degree in Information Technology
South Ural State University - Chelyabinsk, Russia
Skills
Libraries/APIs
HTTP API, Protobuf, Libcurl, ZeroMQ, OpenGL, OpenStreetMap API, POSIX, jQuery, Node.js, Gmail API, X (formerly Twitter) API, FFmpeg
Tools
V8, Emacs, OSGeo, Apache, Git, Valgrind, GNU Debugger (GDB)
Languages
C++, C, XML, XSLT, HTML, C++11, PHP, Go, SQL-99, JavaScript
Frameworks
Qt, NVIDIA OptiX, AngularJS
Platforms
Linux, Android, Oracle, Windows
Storage
SQLite, MySQL, Berkeley DB, MongoDB, PostgreSQL, Microsoft SQL Server
Paradigms
Concurrent Programming
Other
Web Crawlers, Multithreading, Networking, Image Processing, TCP/IP, Non-blocking I/O, Proxy Servers, WebSockets, Arbitrage, NVIDIA Iray, OpenStreetMap, Mapnik, Doxygen, NIST, Image Fingerprinting, Facial Recognition, Biometrics, Lightning Network, OAuth, Bitcoind, P2P, ML Kit, Network Programming
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring