Chef Roles Aren’t Evil

“If roles are evil, what about Al-Qaeda?”

You may laugh, but this is an actual quote from a session at the Opscode Community Summit this year. I want to dive deeper into the community’s apparent dislike for roles in Chef, explain why I think they are still useful, and outline some design patterns for using both them and role cookbooks effectively.

“Stahp Using Roles”

I think the Chef community’s revolt against roles crystallized with Jamie Winsor’s presentation entitled “The Berkshelf Way”. There is a slide that looks like this:

berkshelf-stahp-using-roles

Jamie’s a great guy and an incredible contributor to the Chef ecosystem. (We voted him an Awesome Chef, after all.) However, his advice — just as my advice in this post — should not be blindly followed without ensuring that it applies to your particular situation, and understanding both the advantages & disadvantages.

The title of the talk, “The Berkshelf Way”, also has unintended consequences when it comes to roles and whether you should use them. It implies that if you want to use Berkshelf, you must rigorously follow each and every principle in the talk. (Also, I wonder whether the above deck residing under Opscode’s Slideshare account makes readers believe that Jamie’s views on roles are the Official Opscode Viewpoint, whereas no such thing exists.)

Other well-known folks in the community, though, have also spoken out against roles. Doug Ireton from Nordstrom, for example, advocates against setting attributes or the run list in roles, which of course begets the question: what are roles good for?

What’s a Role Again?

Before I address some of the concerns that Jamie and Doug have raised, let’s review what a role is. Opscode’s documentation states:
A role is a way to define certain patterns and processes that exist across nodes in an organization as belonging to a single job function. Each role consists of zero (or more) attributes and a run list.
In other words, a role represents a server function, consisting of the run list and attributes needed to make that node take on that function.

Roles, of course, also factor into the attribute precedence/merge order chart:

overview_chef_attributes_table

Wait, what’s this forcedefault and forceoverride stuff?

You’ll notice that forcedefault and forceoverride are recent additions to this matrix. This looks like someone got backed into a corner with attribute precedence because they weren’t using roles anymore. If you don’t use roles, you lose attribute precedence levels 4 and 11, which means the only way to override a default attribute set in attribute files, recipes or environments is to use override. I bet the user was already using override levels 9, 10 or 11 for something else, so they didn’t have “enough levels of precedence”. As such, we wound up with more forcing.

One of my favorite quotes from Bryan McLellan, Opscode’s Technical Program Manager for Open Source, is that “more forcing is never the final forcing”. If you are doing this much forcing, you might be doing something wrong. In my view, “doing something wrong” is not using roles at all.

In Defense of Roles

Let’s address the main complaint about roles: they’re not versioned. But what most people want with versioned roles is to version the run list.

Suppose I have a base role and it contains three recipes in its run list: recipe[ntp::client], recipe[chef-client::config], recipe[chef-client]. If I want to add a fourth recipe, recipe[openssh], I’m faced with adding that across all machines that run the base role and deploying it right away. I might break my entire infrastructure that way! This is terrible, right? Yes, it is, which is why folks invented the idea of the “role cookbook” with one or more recipes emulating the run list of that role using includerecipe:

includerecipe "ntp::client"
includerecipe "chef-client::config"
includerecipe "chef-client"
Now if I need to add recipe[openssh] to the runlist, I can modify this recipe, adding includerecipe "openssh", bump the cookbook version, and deploy it across my environments in a controlled way.

Another reason why roles are still valuable: Chef Server has an index for roles, so you can dynamically discover other machines based on their role (function), e.g.

webservers = search(:roles, 'role:mycorporatewebservers')
If you don’t use roles at all, you don’t get to do this.

How is this different than the “Berkshelf Way”?

The “Berkshelf Way” advocates never using roles, but simply adding recipes directly to the run lists of your nodes.

Since you lose out on the attribute precedence and merge order that way, I recommend an alternative: having a role in which you set role-specific attributes, if required. The only thing you delegate to a role cookbook is the run list; the role’s run list contains one item, which is the default recipe of the role cookbook. You should have as many role cookbooks as you do roles, and each of those cookbooks should have one and only one recipe in it: the default recipe.

Wrapping Up: Sensible Design of Roles and Role Cookbooks

In summary:
  • Roles are useful because they factor into attribute precedence and merge order. Without them, you simply have more forcing.
  • Roles allow you to find servers by function within your cookbook code. For example, load balancers can find their backends, app servers can find their database servers, and so on.
  • Role cookbooks allow you to version your role’s run list.
  • If you use role cookbooks, have a role cookbook for every role (1:1). This minimizes the number of dependencies in your role cookbook’s metadata. Don’t have a single role cookbook called “roles”, because this cookbook will depend on every other cookbook in your infrastructure.
  • Each role cookbook should have one and only one recipe that contains enough includerecipe statements to form the run list you would have previously put in the role itself.
  • Keep your roles small, so that the blast radius of making changes to a role’s attributes or the role cookbook is kept to a minimum. Unless you have a very small infrastructure, do not have a role called “webserver”. Instead, have many roles with narrow functions (e.g. “corpsitewebserver”, “appfoowebserver”)

Epilogue: Versioned Roles

Opscode is likely to add some kind of “versioned role” structure in Chef 12. Until then, the foregoing design principles should stop you from shooting yourself in the foot and having to force-all-the-things.

Julian is engineering lead for field solutions at Chef & started his career at Chef in professional services. His first experience with Chef was at SecondMarket, a New-York based alternative markets startup, and he has fifteen years of systems administration & software development experience at outfits large and small. When he's not helping customers, he enjoys good craft beer, indie music, and writing biographies about himself in the third person.

  • http://blog.mindlesstechie.net/ John Alberts

    I think your epiloque about versioned roles is the pretty much the entirety of the reasoning for not using roles currently. Since roles aren’t currently versioned, it makes them pretty much useless if you have multiple environments. Common unversioned code across multiple environments is an accident waiting to happen. I really hope environments get versioning support in Chef 12 as well, since it’s a similar scenario with environments.

    • Julian Dunn

      I’m suggesting that you continue to use roles, but to use a (versioned) role cookbook to drive the run_list if you want to version that. This minimizes the “common unversioned code across multiple environments” problem.

      • http://blog.mindlesstechie.net/ John Alberts

        True, that definitely helps with the problem of run_lists, which I think is a pretty limited use case. I think many people, as I do, also put attributes in roles. Especially when your looking for an easy way to customize something from an upstream cookbook that you don’t want to fork. Without role vesioning, your real options are use a wrapper cookbook, or a cookbook role for now. Just sayin, “I’m counting the days to Chef 12 now that I know we’re finally going to get versioned roles”. :)

      • Michael Glenney

        In that case I agree with you too :) I should have read the article a little more closely I guess. The first 2 sentences of your wrap up start with “Roles are useful” and “Roles allow you”. I guess that’s where I get thrown off. I don’t like to use them at all. I can achieve what you discuss (aside from the attribute precedence part – you got me there) in other places without the added layer of configuration and complexity.

        • Lamont Granquist

          You also lose computable attributes in attributes files, which pushes your attribute logic into recipes and you get compile/converge phase issues, which leads you to eventually needing lazy evaluation, and you’re buying yourself future headaches. If all you’re doing is something really simple it doesn’t matter. But as your cookbook complexity grows, you are going to be painting yourself into more and more corners by not using them at all.

  • Bruno Xavier

    I totally agree, despite what majority thinks, roles are not evil after all. You should merge the benefits of them with design patterns that allow you to keep your system versioned. The Berkshelf way kind of breaks the principle of role cookbooks as Jamie suggests the total absence of roles. Role cookbooks aim to be 1:1 as Julian mention, so that you can get your configuration cohesive and use internal resources like search, for instance, without wait for convergence of dependent nodes as you would with discovery attributes set inside recipes.

  • Michael Glenney

    I agree with John. We came to this conclusion (no roles) well before we ever heard of the “Berkshelf Way”. Just wanted to get that out there so Jamie doesn’t feel picked on :) As for environments, I’m fine with them not being versioned. I only use them to define “logical” environments (dev, prod, qa) for general config, specific to that environment, for which I don’t care about historical record. NTP VIP hostname for example. Who cares what that was 9 months ago?

    For more specific environment information I use “physical” environments (many:1 relationship with logical environments), with versioned config via data bags for a quick and dirty setup, or with the configuration passed in as json data to the chef run from an orchestration piece.

    Role functionality is achieved through role cookbooks (aka Top level cookbooks, aka application cookbooks, aka wrapper cookbooks, etc., etc.) as Julian discussed.

    • Julian Dunn

      In my mind, “role cookbooks” are a distinct concept from “wrapper” cookbooks.

      • Role cookbooks are used to implement the functionality I described here.
      • Wrapper cookbooks are used to mutate/customize an upstream cookbook (for example, a community cookbook) without having to fork the upstream cookbooks.

      At some point in the future, I’ll also write a post about best practices for use of the wrapper/upstream (or application/library) pattern.

      Finally, you’re correct about the physical versus logical environment distinction. Actually, it’s best to think of Chef Environments as being “policy groups”. (We might have named them that way rather than “Environments” if we were doing things over again. :-) ) Therefore, it’s actually fine to have more Chef Environments than physical environments if — and only if — you have a need to apply changes across some subset of your physical environment on a regular basis.

  • Jon Cowie

    Just as a slight corollary to Julian’s point above about not having a single role called “Webserver” – at Etsy we’ve found that this very approach actually makes sense for us, because we have a large number of single-purpose web servers, serving a single site (Etsy.com) which are identically configured. In the specific case I just mentioned, we actually have a WebBase role which contains recipes & attributes shared between our web and api servers (which have nearly identical config), then a sub-role for each class which reflects the differences.

    As a general rule, in our case, our role names map to groups of identical, mono-purpose servers, ie web servers, mysql shards etc.

    We somewhat mitigate the lack of versioning in roles and the blast radius of role changes through the role command wrappers in knife-spork, which give us visibility into Role changes as they occur, and in the way we manage the rollout of infrastructure upgrades, which you can read more about here: http://codeascraft.com/2013/08/02/infrastructure-upgrades-with-chef/

    I’d like to echo Julian’s point about blindly following people’s advice without taking a good look at your organisation’s needs and whether the advice being given actually makes sense for you. In Etsy’s case, we actually use roles very extensively, and do not use role cookbooks, because it makes sense for our particular infrastructure and workflow (in our case, we have to accomodate ~60 people regularly making chef changes).

    That doesn’t stop me agreeing with everything written in the above post, however. I think when used incorrectly, roles can cause you a lot of problems. My advice, however, is don’t accept that just because I said so, or Jamie said so, or Julian said so. Take a good look at how your organisation functions and the particular challenges you’re dealing with, then make an informed decision. Chef is by its very nature generic, because it has to work for an incredibly wide and diverse range of platforms and infrastructures – one of the reasons Opscode have never issued an official approved chef workflow, for example. What made sense for Riot Games or Etsy may not make sense for you.

  • Michael Weinberg

    Thank you for writing this, Julian!

  • Lamont Granquist

    Heh, you beat me to it…

    I’m working on a blog post a bit more extensive than this, but I got knocked out by a cold that I picked up at the summit… You also missed a really critical point which is that if you have computed attributes in your attributes files then you lose the ability to override them correctly because your role cookbook attributes are parsed after the computed attributes in the library, whereas your roles attributes are parsed before any cookbook attributes. By eliminating roles completely you break computable attributes in cookbook attributes (a major driving feature of Chef 11). What that leads to is computable attributes being put into cookbooks. This just about inevitably leads down the path of running into compile-vs-converge issues in cookbook recipes where attributes aren’t set the way you think they should be because of compile-time evaluation. That leads into needing to use the lazy attribute to resources, or into the need for something like CHEF-4110, which should be a huge flashing sign that you’re doing something horribly wrong.

    What people should be doing is minimizing the amount of mutating of attributes going on in cookbook recipe code, that should mostly all be moved to attribute files which removes the whole compile/converge headache completely. Putting attribute setting logic in the attributes files is the correct way to go about writing cookbooks (perhaps there’s an edge condition which really requires mutating attributes in recipe code, but first always start with the attribute files).

    Then roles can be put on a diet, but should not be thrown away. It can be helpful in using test-kitchen to use role cookbooks anyway for testing purposes. It can also be useful to override attributes in the role cookbook since dealing with role artifacts in TK is a bit annoying. But, if you throw away roles entirely then you lose the parse-order precedence, and when your computed attributes fail to get overridden in your role cookbooks, those attributes need to go into roles — moving that logic into recipes doesn’t end well. You /can/ put them on a diet, and they should be versioned and TK and other tools should support them better.

    • Julian Dunn

      Yes, Dan DeLeo pointed out the computed attributes advantage, but it got left on the cutting room floor as it’s a bit difficult to grok. And there are subtleties when computing attributes from wrapper cookbooks as well, e.g.

      base:

      default['postgresql']['version'] = ’9.1′ default['postgresql']['client']['packages'] = ["postgresql-client-#{node['postgresql']['version']}”,”libpq-dev”] default['postgresql']['server']['packages'] = ["postgresql-#{node['postgresql']['version']}”]

      wrapper:

      default['postgresql']['version'] = ’9.2′

      won’t actually “recompute” node['postgresql']['client']['packages'] or node['postgresql']['server']['packages'] unless you forcibly recompute them in the wrapper.

      That’s a topic for another post, on proper use of wrapper cookbooks… in my queue.

      • Lamont Granquist

        Right, that is exactly the point.

        You need to do computed attributes in the attribute of the ‘base’ cookbook, and then your role needs to set postgresql version to 9.2. You shouldn’t be setting those attrs in a wrapper cookbook or a library cookbook, but on the role itself.

        If you don’t do that, what you wind up with is an arms race where you push computed attributes into the recipes. Then eventually you wind up with compile-vs-converge headaches if you go down that road long enough. I believe Dan said recently that in an ideal world he’d break the ability to set attributes in cookbooks completely because of how people can wind up getting painted into corners (lazy attributes and CHEF-4110 again).

        The compile/converge issues are central to how Chef works and are probably part of the landscape that isn’t very fixable, so it makes more sense to try to focus on making roles work, rather than trying to throw them out.

      • jsirex

        I think this way defining of attributes with reference on another is bad practice. It is always possible to change some attributes on runtime. So recipe must calculate such attributes on runtime too. In postgresql cookbook case: move default['postgresql']['client']['packages'] = ["postgresql-client-#{node['postgresql']['version']}”,”libpq-dev”] default['postgresql']['server']['packages'] = ["postgresql-#{node['postgresql']['version']}”]

        from attributes/defaults to recipe/

  • Lamont Granquist

    After digging into yet another CHEF ticket from a frustrated user tonight who had gone down the role cookbooks route, I’m actually going to say that “Roles are Evil” has absolutely FAILED and as a philosophy cannot be salvaged. Nobody should be suggesting its a good idea anymore. Roles need to be fixed, then people really need to start using them again.

    At the Chef Summit people were looking for Opscode to suggest Best Practices, Patterns, Anti-Patterns, etc because its all gotten very complicated and people were looking for a way to know that they’re going down a road that doesn’t eventually lead to a dead end. Well, the roles-are-evil pattern does lead to a dead end. If you’ve wondered why there needs to be 15 different levels of attribute precedence and why forcedefault and forceoverride were necessary — its because someone went way down the role cookbook road way too far and got painted into a corner. Not everyone who jettisons roles will necessary hit that dead end, but given sufficient complexity of what you’re managing its going to happen.

  • eric_tucker_bluespurs

    “Role cookbooks allow you to version your role’s run list.”

    How does this solve the problem that your attributes aren’t versioned? If I want to change an attribute value in my role that is shared across all of my environments, I’m at risk of breaking production due to that change.

    Then it becomes “move the attributes to the role cookbook as well” and we’re back to square one: roles are evil.

    Thoughts?

Archives