Traversing an XML Document
Problem
How do I work with XML documents?
Solution
Clojure has powerful libraries for processing XML documents. One low-level approach is to use the function clojure.xml.parse to read a document and parse it into a map of the root element with child elements nested within it. parse accepts a File, an InputStream, or a String containing a URI for its argument.
Suppose the following XML document is located in a file named "calendar.xml":
<?xml version="1.0"?>
<calendar>
<holiday type="International">
<name>International Lefthanders Day</name>
<date>
<month>August</month>
<day>13</day>
</date>
</holiday>
<holiday type="Personal">
<name>Rover's birthday</name>
<date>
<month>October</month>
<day>12</day>
</date>
</holiday>
<holiday type="National">
<name>Groundhog Day</name>
<date>
<month>February</month>
<day>2</day>
</date>
</holiday>
<holiday type="State">
<name>Kamehameha Day</name>
<date>
<month>June</month>
<day>11</day>
</date>
</holiday>
</calendar>
parse returns a map with three keys:
(use '[clojure.xml :only (parse)])
(def xml-doc (parse (File. "calendar.xml")))
(keys xml-doc) => (:tag :attrs :content)
The :tag of the root element:
(:tag xml-doc) => :calendar
It has no attributes but contains 4 child elements:
(:attrs xml-doc) => nil
(count (:content xml-doc)) => 4
The first child element is a <holiday> element:
(def holiday (first (:content xml-doc)))
(:tag holiday) => :holiday
(:attrs holiday) => {:type "International"}
The holiday contains 2 children of its own, a <name> element and a <date> element:
(:content holiday) =>
[{:tag :name, :attrs nil, :content ["International Lefthanders Day"]}
{:tag :date, :attrs nil, :content [{:tag :month, :attrs nil, :content ["August"]} {:tag :day, :attrs nil, :content ["13"]}]}]
There is a higher-level approach, rather than using parse directly, which may be more convenient. The function clojure.core/xml-seq provides a sequence wrapper that allows you to perform a depth-first traversal of the XML document:
(map (fn [elt] (or (:tag elt) elt)) (xml-seq xml-doc)) =>
(:calendar
:holiday :name "International Lefthanders Day" :date :month "August" :day "13"
:holiday :name "Rover's birthday" :date :month "October" :day "12"
:holiday :name "Groundhog Day" :date :month "February" :day "2"
:holiday :name "Kamehameha Day" :date :month "June" :day "11")
We can use a list comprehension to extract some relevant info:
(defn holiday-name [holiday] (first (:content (first (:content holiday)))) )
(defn holiday-month [holiday] (first (:content (first (:content (second (:content holiday)))))))
(defn holiday-day [holiday] (first (:content (second (:content (second (:content holiday)))))))
(for [elt (xml-seq xml-doc) :when (= :holiday (:tag elt))] [(holiday-name elt) (holiday-month elt) (holiday-day elt)]) =>
(["International Lefthanders Day" "August" "13"]
["Rover's birthday" "October" "12"]
["Groundhog Day" "February" "2"]
["Kamehameha Day" "June" "11"])
Post preview:
Close preview