Most of the time, I tend to talk about the brewing side of what I do here at Wall Brew. Today, I wanted to scratch the surface of the technical aspects of brew-bot and how I brew with programs. Specifically, how I work with BeerXML.
Where Did This Start?
If you remember, a few months ago I created a “normalized” Märzen by scraping and analyzing a few hundred recipes. This was possible because of a content format that is almost universally shared between applications that work with beer. BeerXML, which comes from the creator of BeerSmith, provides rules for everything that you might want to know about a recipe, a judging style, or even a particular step in the mashing process.
Thankfully, this means most of the public APIs and recipe datasets use a single format and a single unit of measurement. At the time, brew-bot was still a minor side-project and most of the back-of-the-napkin calculations it performed were based on a small, hand-selected group of ingredients and data. To help prepare it for the big leagues, I’d need to be able to read and understand recipes that came from other places.
The First Challenge
As the name implies, BeerXML describes beer data in XML because most of the original applications were built for your average Windows stack. While virtually every language has support for XML, not all of it is good. In Clojure, the core library clojure.data.xml provides all of the tools you need to not have to completely write a parser from scratch. While it fully implements the XML spec, the map it lands the data into has a lot to be desired.
Imagine the following sample XML document:
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<recipe>
<fermentables>
<extract>Dark Malt Extract</extract>
<grain>American 2-Row</grain>
</fermentables>
</recipe>
The naive parser will return the following EDN:
{:tag :recipe,
:attrs {},
:content ({:tag :fermentables,
:attrs {},
:content ({:tag :extract,
:attrs {},
:content ("Dark Malt Extract")},
{:tag :grain,
:attrs {},
:content ("American 2-Row")})})}
Since the parser has to respect the possibility to XML attributes, even though they aren’t a part of the BeerXML standard, it defaults to the above. This made navigating and even reading the data difficult. For those more used to handling JSON or EDN, I had hoped for something of the form:
{:recipe
{:fermentables
[{:extract "Dark Malt Extract"}
{:grain "American 2-Row"}]}}
Additionally, record types in the XML spec get wrapped into categories. Each file I was working with was actually a collection of recipes, the contents of which was a list containing a single recipe. So, for those keeping track at home, I had to use the following to grab all of the ingredient data for each page I wanted to visit
(:require [clj-http.client :as http])
(:require [clojure.data.xml :as xml])
(defn fetch-recipe
"Fetch the BeerXML at recipe-url, and extract the inner recipe"
[recipe-url]
(let [page (http/get recipe-url)]
(when (= 200 (:status page))
(-> page
:body
xml/parse-str
:content
first
:content))))
Immediate Action
In my professional life, we have a company value named “Bias to Action.” What it boils down to is a preference to do something small and quick to learn, and continually iterate on. This is meant to be a preference over analysis paralysis and upfront planning, which historically falls apart in application development.
So, instead of solving the difficult problem of reshaping a ton of data from and for standard parsers- I solved the immediate problem of pulling the data I wanted. I figured the experiment would tell me three useful things:
- If the entire idea of data analysis was even worthwhile. Also known as, “Will the beer even be drinkable?”
- How much of a pain would the framework/format be to use in a pet project? Also known as, “Will I just drink beer instead?”
- What will I need to plan for if this works out? Also known as, “Will this thing make other people’s beer?”
A few hours of hacking later, and I had built a small parser to pull recipes from BeerXML, convert them into Imperial measurements, and relabel them to match brew-bot. I turned each recipe into a listing of its ingredients by type, where each ingredient was a key-value pair between the name and its weight.
(defn extract-tag
"Find the matching tag in xml"
[tag xml]
(:content (first (filter #(= (:tag %) tag) xml))))
(defn value-at-tag
"Find the value at the matching tag in xml"
[tag xml]
(first (extract-tag tag (:content xml))))
(defn xml->map
"Transform an XML ingredient to a map of name to amount"
[xml]
(apply merge
(map #(hash-map (value-at-tag :NAME %))
(kg->lb (value-at-tag :AMOUNT %))) xml)))
(defn recipe-xml->edn
[recipe]
(let [boil-size (l->gal (first (extract-tag :BOIL_SIZE recipe)))
fermentables (xml->map (extract-tag :FERMENTABLES recipe))
hops (xml->map (extract-tag :HOPS recipe))
extras (xml->map (extract-tag :MISCS recipe))
yeasts (xml->map (extract-tag :YEASTS recipe))]
{:boil-size boil-size
:fermentables fermentables
:hops hops
:yeasts yeasts
:extras extras}))
In the end, I had something like this:
[{:boil-size 5.00
:fermentables {"American 2-Row" 5.00,
"Munich Malt" 1.25}
:hops {"Galaxy" 1.25,
"Citra" 0.50}
:yeast {"Wyeast 1007" 1}},
...]
But, the work didn’t stop there. brew-bot’s smarter recipe generation modes expect ingredients to be weighted. Meaning that you set the relative probability of the ingredients you care about, and it makes its selections with those probabilities. To extract this, I divided each of the weights by the total gallons for that batch. Then, I summed the weights for each of the matching names.
Easy, right? No. Nothing ever is. As any programmer can tell you, most data is bad. Of the few hundred recipes I scrapped, I found the following anomalies:
- A lot of misplaced record. A few mashtuns labeled as fermentable ingredients. BJCP styles labeled as hops. There was a lot of stuff all over the place.
- A lot of misspelled names. I’ve seen basically every permutation of “Munich” there is at this point.
- Some really, really strange choices. Now, while I love experimenting with beer, my tinkering is driven by some forethought. I like to think that flavor can be reasoned about, and a Chocolate Habanero Oktoberfest goes against all reason.
So, I spent roughly twice the time I had spent on the parser on clean up and “handling” outliers. However, the end result was all worth it. brew-bot read the data in flawlessly, and kicked out a rather delicious recipe.
The Long Con
So, with that experiment under my belt, I had a lot of longer term decisions to make before my next iteration on brew-bot. First and foremost, if I would continue to support BeerXML. While I wanted to retreat back to the safety of a JSON-only ecosystem, it’s a hard sell when it comes down to a widely accepted standard.
So, naturally, I gave up on the format I wanted in favor of the one available to me.
Psych.
I am a programmer after all, and I decided I could have my cake and eat it too. To that end, I spun up a repo in our newly minted GitHub organization: common-beer-format. Using clojure.spec, I implemented the rules and structure of BeerXML in an easier to digest form, and provided tools to convert to and from XML, JSON, and EDN. The new paint smell is just now wearing off, and I’m preparing to integrate it with brew-bot.
If you’re looking to start developing something with common-beer-format, you can include it as a dependency from Clojars.
If you want to learn more about the specifics of codifying beer recipes, stay tuned.