Modernizing Legacy Software: MUD Programming Using Erlang and CloudI
What is Legacy Modernization?
Legacy code is everywhere. And as the rate at which code proliferates continues to increase exponentially, more and more of that code is being relegated to legacy status. In many large organizations, maintenance of legacy systems consumes more than 90% of information systems resources.
The need to modernize legacy code and systems to meet current performance and processing demands is widespread. This post provides a case study of the use of the Erlang programming language, and the Erlang-based CloudI Service Oriented Architecture (SOA), to adapt legacy code – in particular, a decades-old collection of C source code – to the 21st century.
Slaying the Source Code Dragon
Years ago, I was a big fan of text-based multiplayer online games known as Multi-User Dungeons (MUDs). But they were always riddled with performance problems. I decided to dive back into a decades-old pile of C source code and see how we could modernize this legacy code and push these early online games to their limits. At a high level, this project was a great example of using Erlang to adapt legacy software to meet 21st century requirements.
A brief summary:
- The goal: Take an old 50-player-limited MUD video game and push its source code to support thousands upon thousands of simultaneous connections.
- The problem: Legacy, single-threaded C source code.
- The solution: CloudI, an Erlang-based service that provides fault-tolerance and scalability.
What is a Text-Based MUD?
All Massively Multiplayer Online Role Playing Games (MMORPGs) – like World of Warcraft and EverQuest – have developed features whose early origins can be traced back to older text-based multiplayer online games known as Multi-User Dungeons (MUDs).
The first MUD was Roy Trubshaw’s Essex MUD (or MUD1) which was originally developed in 1978 using the MARO-10 assembler language on a DEC PDP-10, but was converted to BCPL, a predecessor of the C programming language (and was running until 1987). (As you can see, these things are older than most programmers.)
MUDs gradually gained popularity during the late 1980s and early 1990s with various MUD codebases written in C. The DikuMUD codebase, for example, is known as the root of one of the largest trees of derived MUD source code, with at least 51 unique variants all based on the same DikuMUD source code. (During this timeframe, incidentally, MUDs became alternatively known as the “Multi-Undergraduate Destroyer” due to the number of college undergraduates that failed out of school due to their obsession with them.)
The problem with legacy MUDs
Historical C MUD source code (including DikuMUD and its variants) is riddled with performance problems due to existing limitations at its time of creation.
Lack of threading
Back then, there was no easily accessible threading library. Moreover, threading would have made the source code more difficult to maintain and modify. As a result, these MUDs were all single-threaded.
During a single “tick” (an increment of the internal clock that tracks the progression of all game events), the MUD source code has to process every game event for every connected socket. In other words: every piece of code slows down the processing of a single tick. And if any computation forces the processing to span longer than a single tick, the MUD lags, impacting every connected player.
With this lag, the game immediately becomes less engaging. Players look on helplessly as their characters die, with their own commands remaining unprocessed.
For the purposes of this legacy application modernization experiment, I chose SillyMUD, a historical derivative of DikuMUD that has influenced modern MMORPGs and the performance problems that they share. During the 1990s, I played a MUD that was derived from the SillyMUD codebase, so I knew the source code would be an interesting and somewhat familiar starting point.
What was I inheriting?
The SillyMUD source code is similar to that of other historical C MUDs in that it is limited to roughly 50 concurrent players (64, to be precise, based on the source code).
However, I noticed that the source code had been modified for performance reasons (i.e., to push its concurrent player limitation). Specifically:
- The source code was missing a domain name lookup on the connection IP address, absent due to the latency enforced by a domain name lookup (normally, an older MUD wants a domain name lookup to make it easier to ban malicious users).
- The source code had its “donate” command disabled (a bit unusual) due to the possible creation of long linked-lists of donated items which then required processing-intensive list traversals. These, in turn, hurt game performance for all other players (single-threaded, remember?).
CloudI provides a service abstraction (to provide a Service Oriented-Architecture (SOA)) in Erlang, C/C++, Java, Python, and Ruby, while keeping software faults isolated within the CloudI framework. Fault-tolerance is provided through CloudI’s Erlang implementation, relying on Erlang’s fault-tolerant features and its implementation of the Actor Model. This fault tolerance is a key feature of CloudI’s Erlang implementation, as all software contains bugs.
CloudI also provides an application server to control the lifetime of service execution and the creation of service processes (either as operating system processes for non-Erlang programming languages or as Erlang processes for services implemented in Erlang) so that service execution occurs without external state impacting reliability. For more, see my previous post.
How can CloudI modernize a legacy text-based MUD?
The historical C MUD source code provides an interesting opportunity for CloudI integration given its reliability problems:
- Game server stability directly impacts the appeal of any game mechanics.
- Focusing software development on fixing server stability bugs limits the size and scope of the resulting game.
With CloudI integration, server stability bugs can still be fixed normally, but their impact is limited so that the game server’s operation is not always impacted when a previously undiscovered bug causes an internal game system to fail. This provides a great example of the use of Erlang to enforce fault-tolerance in a legacy codebase.
What changes were required?
The original codebase was written to be both single-threaded and highly dependent on global variables. My goal was to preserve the legacy source code functionality while modernizing it for present day usage.
With CloudI, I was able to keep the source code single-threaded while still providing socket connection scalability.
Let’s review the necessary changes:
The buffering of SillyMUD console output (a terminal display, often connected with Telnet) was already in place, but some direct file descriptor usage did require buffering (so that the console output could become the response to a CloudI service request).
Socket handling in the original source code relied on a
select() function call to detect input, errors, and the chance for output, as well as to pause for a game tick of 250 milliseconds before handling pending game events.
The CloudI SillyMUD integration relies on incoming service requests for input while pausing with the C CloudI API’s
cloudi_poll function (for the 250 milliseconds before handling the same pending game events). The SillyMUD source code easily ran within CloudI as a CloudI service after being integrated with the C CloudI API (although CloudI provides both C and C++ APIs, using the C API better facilitated integration with SillyMUD’s C source code).
The CloudI integration subscribes to three main service name patterns to handle connect, disconnect, and gameplay events. These name patterns come from the C CloudI API calling subscribe in the integration source code. Accordingly, either WebSocket connections or Telnet connections have service name destinations for sending service requests when connections are established.
The WebSocket and Telnet support in CloudI is provided by internal CloudI services (
cloudi_service_http_cowboy for WebSocket support and
cloudi_service_tcp for Telnet support). Since internal CloudI services are written in Erlang, they are able to leverage Erlang’s extreme scalability, while at the same time using the CloudI service abstraction that provides the CloudI API functions.
By avoiding socket handling, less processing occurred on socket errors or situations like link death (in which users are disconnected from the server). Thus, removing low-level socket handling addressed the primary scalability problem.
But scalability problems remain. For example, the MUD uses the filesystem as a local database for both static and dynamic gameplay elements (i.e., players and their progress, along with the world zones, objects and monsters). Refactoring the legacy code of the MUD to instead rely on a CloudI service for a database would provide further fault-tolerance. If we used a database rather than a filesystem, multiple SillyMUD CloudI service processes could be used concurrently as separate game servers, keeping users isolated from runtime errors and reducing downtime.
How much has the MUD improved?
There were three primary areas of improvement:
- Fault tolerance. With the modernized SillyMUD CloudI service integration, the isolation of socket errors and latency from the SillyMUD source code does provide a degree of fault tolerance.
- Connection scalability. With the use of internal CloudI services, the limitation on SillyMUD concurrent users can easily go from 64 (historically) to 16,384 users (with no latency problems!).
- Efficiency and performance. With the connection handling being done within CloudI instead of the single-threaded SillyMUD source code, the efficiency of the SillyMUD gameplay source code is naturally improved and can handle a higher load.
So, with simple CloudI integration, the number of connections scaled by three orders of magnitude while providing fault-tolerance and increasing the efficiency of the same legacy gameplay.
The Bigger Picture
Erlang has provided 99.9999999% uptime (less than 31.536 milliseconds of downtime per year) for production systems. With CloudI, we bring this same reliability to other programming languages and systems.
Beyond proving the viability of this approach for improving stagnant legacy game server source code (SillyMUD was last modified over 20 years ago in 1993!), this project demonstrates on a broader level how Erlang and CloudI can be leveraged to modernize legacy applications and provide fault-tolerance, improved performance, and high availability in general. These results hold promising potential for adapting legacy code to the 21st century without requiring a major software overhaul.
Located in Seattle, WA, United States
Member since April 4, 2016
About the author
Michael is a distributed systems and fault tolerance expert, having worked with AT&T, E*Trade, Nokia and others.