Creating a Ruby DSL: A Guide to Advanced Metaprogramming
Write better Ruby code by leveraging its metaprogramming features. In this article, you will learn how to create your own Ruby domain specific language (DSL).
Write better Ruby code by leveraging its metaprogramming features. In this article, you will learn how to create your own Ruby domain specific language (DSL).
Máté is a full-stack software engineer with 8+ years of experience in JavaScript, Node.js, Angular, Ruby on Rails and PHP.
Expertise
PREVIOUSLY AT
Domain specific languages (DSL) are an incredibly powerful tool for making it easier to program or configure complex systems. They are also everywhere—as a software engineer you are most likely using several different DSLs on a daily basis.
In this article, you will learn what domain specific languages are, when they should be used, and finally how you can make your very own DSL in Ruby using advanced metaprogramming techniques.
This article builds upon Nikola Todorovic’s introduction to Ruby metaprogramming, also published on the Toptal Blog. So if you are new to metaprogramming, make sure you read that first.
What Is a Domain Specific Language?
The general definition of DSLs is that they are languages specialized to a particular application domain or use case. This means that you can only use them for specific things—they are not suitable for general-purpose software development. If that sounds broad, that’s because it is—DSLs come in many different shapes and sizes. Here are a few important categories:
- Markup languages such as HTML and CSS are designed for describing specific things like the structure, content, and styles of web pages. It is not possible to write arbitrary algorithms with them, so they fit the description of a DSL.
- Macro and query languages (e.g., SQL) sit on top of a particular system or another programming language and are usually limited in what they can do. Therefore they obviously qualify as domain specific languages.
- Many DSLs do not have their own syntax—instead, they use the syntax of an established programming language in a clever way that feels like using a separate mini-language.
This last category is called an internal DSL, and it is one of these that we are going to create as an example very soon. But before we get into that, let’s take a look at a few well-known examples of internal DSLs. The route definition syntax in Rails is one of them:
Rails.application.routes.draw do
root to: "pages#main"
resources :posts do
get :preview
resources :comments, only: [:new, :create, :destroy]
end
end
This is Ruby code, yet it feels more like a custom route definition language, thanks to the various metaprogramming techniques that make such a clean, easy-to-use interface possible. Notice that the structure of the DSL is implemented using Ruby blocks, and method calls such as get
and resources
are used for defining the keywords of this mini-language.
Metaprogramming is used even more heavily in the RSpec testing library:
describe UsersController, type: :controller do
before do
allow(controller).to receive(:current_user).and_return(nil)
end
describe "GET #new" do
subject { get :new }
it "returns success" do
expect(subject).to be_success
end
end
end
This piece of code also contains examples for fluent interfaces, which allow declarations to be read out loud as plain English sentences, making it a lot easier to understand what the code is doing:
# Stubs the `current_user` method on `controller` to always return `nil`
allow(controller).to receive(:current_user).and_return(nil)
# Asserts that `subject.success?` is truthy
expect(subject).to be_success
Another example of a fluent interface is the query interface of ActiveRecord and Arel, which uses an abstract syntax tree internally for building complex SQL queries:
Post. # =>
select([ # SELECT
Post[Arel.star], # `posts`.*,
Comment[:id].count. # COUNT(`comments`.`id`)
as("num_comments"), # AS num_comments
]). # FROM `posts`
joins(:comments). # INNER JOIN `comments`
# ON `comments`.`post_id` = `posts`.`id`
where.not(status: :draft). # WHERE `posts`.`status` <> 'draft'
where( # AND
Post[:created_at].lte(Time.now) # `posts`.`created_at` <=
). # '2017-07-01 14:52:30'
group(Post[:id]) # GROUP BY `posts`.`id`
Although the clean and expressive syntax of Ruby along with its metaprogramming capabilities makes it uniquely suited for building domain specific languages, DSLs exist in other languages as well. Here is an example of a JavaScript test using the Jasmine framework:
describe("Helper functions", function() {
beforeEach(function() {
this.helpers = window.helpers;
});
describe("log error", function() {
it("logs error message to console", function() {
spyOn(console, "log").and.returnValue(true);
this.helpers.log_error("oops!");
expect(console.log).toHaveBeenCalledWith("ERROR: oops!");
});
});
});
This syntax is perhaps not as clean as that of the Ruby examples, but it shows that with clever naming and creative use of the syntax, internal DSLs can be created using almost any language.
The benefit of internal DSLs is that they don’t require a separate parser, which can be notoriously difficult to implement properly. And because they use the syntax of the language they are implemented in, they also integrate seamlessly with the rest of the codebase.
What we have to give up in return is syntactic freedom—internal DSLs have to be syntactically valid in their implementation language. How much you have to compromise in this regard depends largely on the selected language, with verbose, statically typed languages such as Java and VB.NET being on one end of the spectrum, and dynamic languages with extensive metaprogramming capabilities such as Ruby on the other end.
Building Our Own—A Ruby DSL for Class Configuration
The example DSL we are going to build in Ruby is a reusable configuration engine for specifying the configuration attributes of a Ruby class using a very simple syntax. Adding configuration capabilities to a class is a very common requirement in the Ruby world, especially when it comes to configuring external gems and API clients. The usual solution is an interface like this:
MyApp.configure do |config|
config.app_id = "my_app"
config.title = "My App"
config.cookie_name = "my_app_session"
end
Let’s implement this interface first—and then, using it as the starting point, we can improve it step by step by adding more features, cleaning up the syntax, and making our work reusable.
What do we need to make this interface work? The MyApp
class should have a configure
class method that takes a block and then executes that block by yielding to it, passing in a configuration object that has accessor methods for reading and writing the configuration values:
class MyApp
# ...
class << self
def config
@config ||= Configuration.new
end
def configure
yield config
end
end
class Configuration
attr_accessor :app_id, :title, :cookie_name
end
end
Once the configuration block has run, we can easily access and modify the values:
MyApp.config
=> #<MyApp::Configuration:0x2c6c5e0 @app_id="my_app", @title="My App", @cookie_name="my_app_session">
MyApp.config.title
=> "My App"
MyApp.config.app_id = "not_my_app"
=> "not_my_app"
So far, this implementation does not feel like a custom language enough to be considered a DSL. But let’s take things one step at a time. Next, we will decouple the configuration functionality from the MyApp
class and make it generic enough to be usable in many different use cases.
Making It Reusable
Right now, if we wanted to add similar configuration capabilities to a different class, we would have to copy both the Configuration
class and its related setup methods into that other class, as well as edit the attr_accessor
list to change the accepted configuration attributes. To avoid having to do this, let’s move the configuration features into a separate module called Configurable
. With that, our MyApp
class will look like this:
class MyApp
#BOLD
include Configurable
#BOLDEND
# ...
end
Everything related to configuration has been moved to the Configurable
module:
#BOLD
module Configurable
def self.included(host_class)
host_class.extend ClassMethods
end
module ClassMethods
#BOLDEND
def config
@config ||= Configuration.new
end
def configure
yield config
end
#BOLD
end
#BOLDEND
class Configuration
attr_accessor :app_id, :title, :cookie_name
end
#BOLD
end
#BOLDEND
Not much has changed here, except for the new self.included
method. We need this method because including a module only mixes in its instance methods, so our config
and configure
class methods will not be added to the host class by default. However, if we define a special method called included
on a module, Ruby will call it whenever that module is included in a class. There we can manually extend the host class with the methods in ClassMethods
:
def self.included(host_class) # called when we include the module in `MyApp`
host_class.extend ClassMethods # adds our class methods to `MyApp`
end
We are not done yet—our next step is to make it possible to specify the supported attributes in the host class that includes the Configurable
module. A solution like this would look nice:
class MyApp
#BOLD
include Configurable.with(:app_id, :title, :cookie_name)
#BOLDEND
# ...
end
Perhaps somewhat surprisingly, the code above is syntactically correct—include
is not a keyword but simply a regular method that expects a Module
object as its parameter. As long as we pass it an expression that returns a Module
, it will happily include it. So, instead of including Configurable
directly, we need a method with the name with
on it that generates a new module that is customized with the specified attributes:
module Configurable
#BOLD
def self.with(*attrs)
#BOLDEND
# Define anonymous class with the configuration attributes
#BOLD
config_class = Class.new do
attr_accessor *attrs
end
#BOLDEND
# Define anonymous module for the class methods to be "mixed in"
#BOLD
class_methods = Module.new do
define_method :config do
@config ||= config_class.new
end
#BOLDEND
def configure
yield config
end
#BOLD
end
#BOLDEND
# Create and return new module
#BOLD
Module.new do
singleton_class.send :define_method, :included do |host_class|
host_class.extend class_methods
end
end
end
#BOLDEND
end
There is a lot to unpack here. The entire Configurable
module now consists of just a single with
method, with everything happening within that method. First, we create a new anonymous class with Class.new
to hold our attribute accessor methods. Because Class.new
takes the class definition as a block and blocks have access to outside variables, we are able to pass the attrs
variable to attr_accessor
without problems.
def self.with(*attrs) # `attrs` is created here
# ...
config_class = Class.new do # class definition passed in as a block
attr_accessor *attrs # we have access to `attrs` here
end
The fact that blocks in Ruby have access to outside variables is also the reason why they are sometimes called closures, as they include, or “close over” the outside environment that they were defined in. Note that I used the phrase “defined in” and not “executed in”. That’s correct – regardless of when and where our define_method
blocks will eventually be executed, they will always be able to access the variables config_class
and class_methods
, even after the with
method has finished running and returned. The following example demonstrates this behavior:
def create_block
foo = "hello" # define local variable
return Proc.new { foo } # return a new block that returns `foo`
end
block = create_block # call `create_block` to retrieve the block
block.call # even though `create_block` has already returned,
=> "hello" # the block can still return `foo` to us
Now that we know about this neat behavior of blocks, we can go ahead and define an anonymous module in class_methods
for the class methods that will be added to the host class when our generated module is included. Here we have to use define_method
to define the config
method, because we need access to the outside config_class
variable from within the method. Defining the method using the def
keyword would not give us that access because regular method definitions with def
are not closures – however, define_method
takes a block, so this will work:
config_class = # ... # `config_class` is defined here
# ...
class_methods = Module.new do # define new module using a block
define_method :config do # method definition with a block
@config ||= config_class.new # even two blocks deep, we can still
end # access `config_class`
Finally, we call Module.new
to create the module that we are going to return. Here we need to define our self.included
method, but unfortunately we cannot do that with the def
keyword, as the method needs access to the outside class_methods
variable. Therefore, we have to use define_method
with a block again, but this time on the singleton class of the module, as we are defining a method on the module instance itself. Oh, and since define_method
is a private method of the singleton class, we have to use send
to invoke it instead of calling it directly:
class_methods = # ...
# ...
Module.new do
singleton_class.send :define_method, :included do |host_class|
host_class.extend class_methods # the block has access to `class_methods`
end
end
Phew, that was some pretty hardcore metaprogramming already. But was the added complexity worth it? Take a look at how easy it is to use and decide for yourself:
class SomeClass
include Configurable.with(:foo, :bar)
# ...
end
SomeClass.configure do |config|
config.foo = "wat"
config.bar = "huh"
end
SomeClass.config.foo
=> "wat"
But we can do even better. In the next step we will clean up the syntax of the configure
block a little bit to make our module even more convenient to use.
Cleaning Up the Syntax
There is one last thing that is still bothering me with our current implementation—we have to repeat config
on every single line in the configuration block. A proper DSL would know that everything within the configure
block should be executed in the context of our configuration object and enable us to achieve the same thing with just this:
MyApp.configure do
app_id "my_app"
title "My App"
cookie_name "my_app_session"
end
Let’s implement it, shall we? From the looks of it, we will need two things. First, we need a way to execute the block passed to configure
in the context of the configuration object so that method calls within the block go to that object. Second, we have to change the accessor methods so that they write the value if an argument is provided to them and read it back when called without an argument. A possible implementation looks like this:
module Configurable
def self.with(*attrs)
#BOLD
not_provided = Object.new
#BOLDEND
config_class = Class.new do
#BOLD
attrs.each do |attr|
define_method attr do |value = not_provided|
if value === not_provided
instance_variable_get("@#{attr}")
else
instance_variable_set("@#{attr}", value)
end
end
end
attr_writer *attrs
#BOLDEND
end
class_methods = Module.new do
# ...
def configure(&block)
#BOLD
config.instance_eval(&block)
#BOLDEND
end
end
# Create and return new module
# ...
end
end
The simpler change here is running the configure
block in the context of the configuration object. Calling Ruby’s instance_eval
method on an object lets you execute an arbitrary block of code as if it was running within that object, which means that when the configuration block calls the app_id
method on the first line, that call will go to our configuration class instance.
The change to the attribute accessor methods in config_class
is a bit more complicated. To understand it, we need to first understand what exactly attr_accessor
was doing behind the scenes. Take the following attr_accessor
call for example:
class SomeClass
attr_accessor :foo, :bar
end
This is equivalent to defining a reader and writer method for each specified attribute:
class SomeClass
def foo
@foo
end
def foo=(value)
@foo = value
end
# and the same with `bar`
end
So when we wrote attr_accessor *attrs
in the original code, Ruby defined the attribute reader and writer methods for us for every attribute in attrs
—that is, we got the following standard accessor methods: app_id
, app_id=
, title
, title=
and so on. In our new version, we want to keep the standard writer methods so that assignments like this still work properly:
MyApp.config.app_id = "not_my_app"
=> "not_my_app"
We can keep auto-generating the writer methods by calling attr_writer *attrs
. However, we can no longer use the standard reader methods, as they also have to be capable of writing the attribute to support this new syntax:
MyApp.configure do
app_id "my_app" # assigns a new value
app_id # reads the stored value
end
To generate the reader methods ourselves, we loop over the attrs
array and define a method for each attribute that returns the current value of the matching instance variable if no new value is provided and writes the new value if it is specified:
not_provided = Object.new
# ...
attrs.each do |attr|
define_method attr do |value = not_provided|
if value === not_provided
instance_variable_get("@#{attr}")
else
instance_variable_set("@#{attr}", value)
end
end
end
Here we use Ruby’s instance_variable_get
method to read an instance variable with an arbitrary name, and instance_variable_set
to assign a new value to it. Unfortunately the variable name must be prefixed with an “@” sign in both cases—hence the string interpolation.
You might be wondering why we have to use a blank object as the default value for “not provided” and why we can’t simply use nil
for that purpose. The reason is simple—nil
is a valid value that someone might want to set for a configuration attribute. If we tested for nil
, we would not be able to tell these two scenarios apart:
MyApp.configure do
app_id nil # expectation: assigns nil
app_id # expectation: returns current value
end
That blank object stored in not_provided
is only ever going to be equal to itself, so this way we can be certain that nobody is going to pass it into our method and cause an unintended read instead of a write.
Adding Support for References
There is one more feature that we could add to make our module even more versatile—the ability to reference a configuration attribute from another one:
MyApp.configure do
app_id "my_app"
title "My App"
cookie_name { "#{app_id}_session" }
End
MyApp.config.cookie_name
=> "my_app_session"
Here we added a reference from cookie_name
to the app_id
attribute. Note that the expression containing the reference is passed in as a block—this is necessary in order to support the delayed evaluation of the attribute value. The idea is to only evaluate the block later when the attribute is read and not when it is defined—otherwise funny things would happen if we defined the attributes in the “wrong” order:
SomeClass.configure do
foo "#{bar}_baz" # expression evaluated here
bar "hello"
end
SomeClass.config.foo
=> "_baz" # not actually funny
If the expression is wrapped in a block, that will prevent it from being evaluated right away. Instead, we can save the block to be executed later when the attribute value is retrieved:
SomeClass.configure do
foo { "#{bar}_baz" } # stores block, does not evaluate it yet
bar "hello"
end
SomeClass.config.foo # `foo` evaluated here
=> "hello_baz" # correct!
We do not have to make big changes to the Configurable
module to add support for delayed evaluation using blocks. In fact, we only have to change the attribute method definition:
define_method attr do |value = not_provided, &block|
if value === not_provided && block.nil?
result = instance_variable_get("@#{attr}")
result.is_a?(Proc) ? instance_eval(&result) : result
else
instance_variable_set("@#{attr}", block || value)
end
end
When setting an attribute, the block || value
expression saves the block if one was passed in, or otherwise it saves the value. Then, when the attribute is later read, we check if it is a block and evaluate it using instance_eval
if it is, or if it is not a block, we return it like we did before.
Supporting references comes with its own caveats and edge cases, of course. For example, you can probably figure out what happens if you read any of the attributes in this configuration:
SomeClass.configure do
foo { bar }
bar { foo }
end
The Finished Module
In the end, we have got ourselves a pretty neat module for making an arbitrary class configurable and then specifying those configuration values using a clean and simple DSL that also lets us reference one configuration attribute from another:
class MyApp
include Configurable.with(:app_id, :title, :cookie_name)
# ...
end
SomeClass.configure do
app_id "my_app"
title "My App"
cookie_name { "#{app_id}_session" }
end
Here is the final version of the module that implements our DSL—a total of 36 lines of code:
module Configurable
def self.with(*attrs)
not_provided = Object.new
config_class = Class.new do
attrs.each do |attr|
define_method attr do |value = not_provided, &block|
if value === not_provided && block.nil?
result = instance_variable_get("@#{attr}")
result.is_a?(Proc) ? instance_eval(&result) : result
else
instance_variable_set("@#{attr}", block || value)
end
end
end
attr_writer *attrs
end
class_methods = Module.new do
define_method :config do
@config ||= config_class.new
end
def configure(&block)
config.instance_eval(&block)
end
end
Module.new do
singleton_class.send :define_method, :included do |host_class|
host_class.extend class_methods
end
end
end
end
Looking at all this Ruby magic in a piece of code that is nearly unreadable and therefore very hard to maintain, you might wonder if all this effort was worth it just to make our domain specific language a little bit nicer. The short answer is that it depends—which brings us to the final topic of this article.
Ruby DSLs—When to Use and When Not to Use Them
You have probably noticed while reading the implementation steps of our DSL that, as we made the external facing syntax of the language cleaner and easier to use, we had to use an ever increasing number of metaprogramming tricks under the hood to make it happen. This resulted in an implementation that will be incredibly hard to understand and modify in the future. Like so many other things in software development, this is also a tradeoff that must be carefully examined.
For a domain specific language to be worth its implementation and maintenance cost, it must bring an even greater sum of benefits to the table. This is usually achieved by making the language reusable in as many different scenarios as possible, thereby amortizing the total cost between many different use cases. Frameworks and libraries are more likely to contain their own DSLs exactly because they are used by lots of developers, each of whom can enjoy the productivity benefits of those embedded languages.
So, as a general principle, only build DSLs if you, other developers, or the end users of your application will be getting a lot of use out of them. If you do create a DSL, make sure to include a comprehensive test suite with it, as well as properly document its syntax as it can be very hard to figure out from the implementation alone. Future you and your fellow developers will thank you for it.
Further Reading on the Toptal Blog:
Máté Solymosi
Zug, Switzerland
Member since July 13, 2015
About the author
Máté is a full-stack software engineer with 8+ years of experience in JavaScript, Node.js, Angular, Ruby on Rails and PHP.
Expertise
PREVIOUSLY AT