Refactoring: Improving the Design of Existing Code (My Notes and Highlights)

Note: What follows are not my thoughts, opinions and interpretations, but just a copy + paste of my notes and highlights taken straight from the book.

What is Refactoring?

Refactoring is the process of changing a software system in a way that does not alter the external behavior of the code yet improves its internal structure. In essence, when you refactor, you are improving the design of the code after it has been written.

Refactoring (noun): a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.

Refactoring (verb): to restructure software by applying a series of refactorings without changing its observable behavior.

A poorly designed system is hard to change—because it is difficult to figure out what to change and how these changes will interact with the existing code to get the behavior you want. And if it is hard to figure out what to change, there is a good chance that I will make mistakes and introduce bugs.

With refactoring, we can take a bad, even chaotic, design and rework it into well-structured code. Each step is simple—even simplistic. Yet the cumulative effect of these small changes can radically improve the design. It is the exact reverse of the notion of software decay.

Design, rather than occurring all up front, occurs continuously during development. As you build the system, you learn how to improve the design.

When you have to add a feature to a program but the code is not structured in a convenient way, first refactor the program to make it easy to understand and to add the feature, then add the feature.- Kent Beck

Why Refactor?

Improves the design of software over time.
Makes the codebase easier to understand and reason about.
Helps you find bugs and program faster.

Test before Refactoring

Make sure you have a solid set of tests for the piece of code you are refactoring. Tests are essential because the larger and complex the code, the more likely it is that your changes will break something.

Run your tests after each change you make. Testing after each change means that when you make a mistake, you only have a small change to consider in order to spot the error, which makes it far easier to find and fix. This is the essence of the refactoring process: small changes and testing after each change.

Two Types of Understanding

When you are coding, you can have two types of understanding about what the code is doing.

The understanding in your head. This is fragile and temporary, residing in your short-term memory. You read code, you understand, but you forget about it a few days later. When you come back to the same code after some time, you have to again make an effort to understand the code.
The long term understanding, and it's not in your head. You persist the first type of understanding by moving it from your head to the codebase itself. By doing this, the code tells you what it's doing, and you don't have to figure it out again.

The way you do this is by extracting that piece of code into a new function and give it an intention revealing name. A name that tells you what it's doing. You can always go to that function to see how it's doing it, but you don't have to. You just abstracted that code to something simpler. The next time you are reading the code, you just need to glance at the name to understand what it's doing.

💡

Read the code, gain some insight, and use refactoring to move that insight from your head back into the code. The clearer code then makes it easier to understand it, leading to deeper insights and a beneficial positive feedback loop.

The key to effective refactoring is recognizing that you go faster when you take tiny steps, the code is never broken, and you can compose those small steps into substantial changes.

Before we look at various refactoring techniques, we must know what we should refactor first. Martin provides a set of code smells you should watch out for in our codebases. Let's look at each.

Code Smells

Deciding when to start refactoring - and when to stop - is just as important to refactoring as knowing how to operate the mechanics of it. Here are a few code smells you should watch out in your code.

Mysterious Name

One of the most important parts of clear code is good names, so put a lot of thought into naming functions, modules, variables, classes so they clearly communicate what they do and how to use them.

Duplicated Code

Duplicated code is harder to change. If you see the same code structure in more than one place, unify them in a single location.

Long Function

The longer a function is, the more difficult it is to understand. The programs that live best and longest are those with short functions.

Long Parameter List

Long parameter lists make the function difficult to understand. If you're passing multiple related parameters frequently, combine them into an object and pass that object instead. Also, rather than pulling lots of data out of an existing object and passing them separately, preserve the whole object.

Global Data

The problem with global, mutable data is that it can be modified from anywhere in the codebase, and there's no mechanism to discover which bit of code may have touched it. Fix it by first encapsulating the global variable in a function and then moving it within a class or a module.

Mutable Data

Changes to the underlying data can often lead to unexpected consequences and tricky bugs. To prevent this, encapsulate data within a function, module, or a class and restrict access to it.

Shotgun Surgery

Every time you have to make a change, you have to make a lot of little edits to a lot of different classes. When the changes are all over the place, they are hard to find, and it's easy to miss an important change.

Feature Envy

When a function in one module / class spends more time communicating with functions or data inside another module than it does within its own module.

Primitive Obsession

Use of primitive values such as numbers or strings instead of small objects. Creating a primitive is so much easier than making a new type. Hence they proliferate. If you have large number of them, logically group them into their own types and move the related behavior into methods on these types.

Repeated Switch Statements

The problem with duplicate switches is that, whenever you add a new clause, you have to find and update all the switches. Replace switches with polymorphism, by creating new specialized types.

Large Class

When a class is trying to do too much, it often shows up as too many fields and methods. It dilutes the primary responsibility of the class and makes it hard to understand. A class with too much code is a prime breeding ground for duplicated code and bugs. Extract class to bundle a number of variables.

Data Class

These are classes that have fields, getters and setters for the fields, and nothing else. They are mostly used as dumb data holders and end up getting manipulated too much by their clients. They are often a sign of behavior in wrong places. Move the behavior from clients into the data class itself.

Comments

It's not that developers shouldn't write comments at all. They aren't a bad smell, but a sweet smell. However, they're often used as a deodorant 😅 and many developers tend to use them everywhere. Often, they are superfluous and can be refactored away by 'intention-revealing named' methods. If you need a comment to explain what (not 'why') a block of code does, extract a well-named function.

A good time to use a comment is when you don't know what to do. In addition to describing what is going on, comments can indicate areas in which you aren't sure. A comment can also explain why you did something.

Let's look at a few important refactorings now. I will expand each topic below into a separate post in future with code examples in Ruby (Martin uses JavaScript in the book).

The trick to reading this book is to carefully read through every single refactoring pattern and then try to apply it on your code base (you don’t have to commit if it doesn’t fix things). You can’t just blow through it or you won’t really learn it. And you can’t just say “oh, I’ll look up a refactoring when I need it” — because then you don’t know what to look for.- DHH

Extract Function

Look at a piece of code, understand what it is doing, then extract it into its own function named after its purpose. If you have to spend effort looking at a piece of code and figuring out what it's doing, then you should extract it into a function and name the function after the "what".

Extract Variable

Expressions can become very complex and hard to read. In such situations, local variables may help break the expression down into something manageable. It allows you to name a part of a more complex piece of logic, to better understand the purpose of what's happening.

Encapsulate Variable or Data

Compared to functions, refactoring plain data is difficult, as you have to find and rename old references. First encapsulate the data by routing all its access through functions.

Encapsulating data also provides clear point to monitor changes and usage of the data; you can easily add validation or access-control logic in the function.

Introduce Object

If you see groups of data items that regularly travel together, it's a data clump, and replace it with a single data structure. Grouping scattered but related data into a structure makes the relationship between them explicit.

It also reduces the size of parameter lists for functions that use these items. You can also add new behavior via methods, either in a module of common functions or a class that combines the structure with functions. These new abstractions can greatly simplify your understanding of the domain.

Combine Functions into Class

When you see a group of functions that operate closely together on a common body of data (usually passed as arguments to the function), extract and group them via a class.

Encapsulate Collection

In general, encapsulate any mutable data in your programs. This makes it easier to see when and how data structures are modified.

Access to a collection variable may be encapsulated, but if the getter returns the collection itself, then that collection's items can be altered by outside code. To avoid this, provide enumerable methods (add and remove) on the class itself. This way, changes to the collection go through the owning class.

Also, ensure that the getter for the collection does not return the raw collection, so that clients cannot accidentally change it. Return a copy instead.

Replace Primitive with Object

In early phases of development, you often represent simple facts as simple data items, like numbers or strings. As development proceeds, those simple items aren't so simple anymore.

Create a new class for that bit of data. At first, it simply wraps the primitive. But once you have it, you can put behavior specific to its needs.

Replace Temp with Query

Whenever you see variables calculated in the same way in different places, extract them into a single function.

base_price = quantity * item_price

# to

def base_price
  quantity * item_price
end

When breaking up a large function, turning variables into their own functions makes it easier to extract parts of the function. It also allows you to avoid duplicating the calculation logic in similar functions.

Extract Class

In theory, classes should offer crisp, clear abstractions, and focus on one thing. In practice, classes grow. You add some operation here, a bit of data there.

You add some responsibility to the class feeling that it's not worth a separate class - but as that responsibility grows, the class becomes too complicated.

If a class is too big to understand, verify if it has too many responsibilities, and then extract them into their own classes. A good indicator is when a subset of the data and methods seem to go together, or change together.

Hide Delegate

Good modular design needs encapsulation. Clients need to know less about other parts of the system. Then, when things change, fewer modules need to be told about the change - which makes the change easier to make.

manager = person.department.manager

# to

manager = person.manager

class Person
  def manager
    department.manager
  end
end

Move Field / Statements / Function

Code is easier to understand when things that are related to each other appear together.

Modularity allows you to make modifications to a program while only having to understand a small part of it. For this, you need to ensure that related code is grouped together. As you better understand what you're doing, you learn how to best group together related code. For this, you need to move stuff around.

Strength of a program is founded on its data structures. If you have good set of data structures that match the problem, then the behavior code is simple. Poor data structures lead to lots of coode whose job is merely dealing with poor data. Not only the code becomes harder to understand, but data structures obscure what the program is doing.

Move a function when it references elements in other contexts more than the one it currently resides in. Moving it together with those elements often improves encapsulation, allowing other parts of the software to be less dependent on the details of this module.
As soon as you realize that a data structure isn't right, it's vital to change it. Poor data structures will confuse your thinking and complicate your code.

Split Loop

You often see loops that are doing two different things at once just because they can do that with one pass through a loop. But if you're doing two different things in the same loop, then whenever you need to modify the loop you've to understand both things. By splitting the loop, you ensure you only need to understand the behavior you need to modify. Splitting a loop can also make it easier to use.

Many programmers are uncomfortable with this refactoring, as it forces you to execute the loop twice. However, you can always profile it and combine if it indeed is a bottleneck. In reality, the actual iteration through even a large list is rarely a bottleneck, and splitting the loops often enables other, more powerful, optimizations.

Remove Dead Code

If a piece of code isn't used anymore, we should delete it.

Unused code is still a significant burden when trying to understand how the software works. If it isn't very obvious, the programmers still have to spend time understanding what it's doing and why changing it doesn't seem to alter the output as they expected.

Decompose Conditional

One of the most common sources of complexity in a program is complex conditional logic. Long conditions can easily obscure the 'why' of the code. As with any large block of code, make your intention clearer by extracting the conditional to a function named after the intention.

Replace Nested Conditional with Guard Clauses

The guard clause says: "this is not the core to this function. If this happens, do something and get out."

def pay_amount
  if teen?
    result = teen_amount
  else
    if adult?
      result = adult_amount
    else
      if old?
        result = old_amount
      else
        result = normal_amount
      end
    end
  end

  result
end

# becomes

def pay_amount
  return teen_amount if teen?
  return adult_amount if adult?
  return old_amount if old?
  normal_amount
end

Replace Conditional with Polymorphism

Remove the duplication of the common switch logic by creating classes for each case and using polymorphism to bring out the type-specific behavior.

Think of the logic as a base case with variants. The base case may be the most common or most straightforward. Put this logic into a superclass. Then put each variant case into a subclass, which you express with code that emphasizes its difference from the base class.

Introduce Special Case

customer_name = "occupant" if (customer == unknown)

# becomes

class UnknownCustomer
  def name
    "occupant"
  end
end

Create a special-case element that captures all the common behavior. This lets you replace most of the special-case checks with simple calls.

Separate Query from Modifier

Strive to write query functions without any observable side effects. If you come across a method that returns a value but also has side effects, try to separate the query from the modifier.

def get_balance_send_bill
  result = customer.invoices.sum { |i| i.total }
  customer.send_bill
  result
end

# becomes

def balance
  customer.invoices.sum { |i| i.total }
end

def send_bill
  customer.send_bill
end

Parameterize Function

If you see two functions that carry out very similar logic with different literal values, remove the duplication by using a single function with parameters for the different values. This increases the usefulness of the function, since you can apply it elsewhere with different values.

Remove Boolean Flag Arguments

def dimension=(name, value)
  @height = value if name == "height"
  @width = value if name == "width"
end

# becomes

def height=(value)
  @height = value
end

def width=(value)
  @width = value
end

A flag argument tells the function which logic to execute. It complicates the process of understanding what function calls are available and how to call them. It's better to provide an explicit function for each separate task.

Preserve Whole Object

When you see code that passes multiple properties from an object into a function, replace them by passing the whole object itself. In addition to keeping the function signature simple and understandable, it prevents unnecessary modifications to the properties. Also, you may find that sometimes the function logic can itself be moved to the object itself (feature envy).

Replace Function with Command

Functions, on their own or attached to objects as methods - are one of the fundamental building blocks of programming. However, there are times when it's useful to encapsulate a function into its own object, referred to as a "command object".

A command is built around a single method, whose purpose is to break down the original (long and complicated) function via multiple private methods and instance variables.

Command objects provide a powerful mechanism for handling complex computations. They can easily be broken down into separate methods sharing common state through instance variables.

Replace Commands with Functions

If the function isn't too complex, then a command object is more trouble than its worth and should be turned into a regular function.

Pull Up Method / Field / Constructor

Eliminating duplicate code is important. If you notice duplicate methods on subclasses, pull it up in the superclass. If the method definitions are different but logic is same, first change the methods so they match, then pull them up.

Extract Superclass

You don't have to plan the type hierarchy carefully in advance. As the program evolves, if you see two classes doing similar things, use inheritance to pull their similarities together into a superclass. First pull up the common fields, then common methods.

That's a wrap. I hope you liked this article and you learned something new. If you would like to read more book notes (not restricted to programming and software development), check out my other blog at https://book-notes.pages.dev.

As always, if you have any questions or feedback, didn't understand something, or found a mistake, please leave a comment below or send me an email. I reply to all emails I get from developers, and I look forward to hearing from you.

If you'd like to receive future articles directly in your email, please subscribe to my blog. If you're already a subscriber, thank you.