JSON

My SimpleXML article has been sitting there for too long. I had apparently been living under a rock, and the whole idea with SimpleXML was to make XML act like JSON, which I had yet to become acquainted with.

Please, oh please, if you have any choice in the matter, do not use XML to pass data between applications. Both you and the folks that consume your services will be more productive if you just move to a *real* data structure to pass your data.

Pythoners - I recommend SimpleJSON for JSON support. For JS, google json2.js. For C#... I wrote my own class to mimic the flexibility of JS objects (response["result"]["message"].Str = "Success" etc); I think dynamic classes might fit the bill, but that's C# 4.0.

TreeXML for Python (et al)

I've finished the XML encoder for TreeXML. It's just as simple as the decoder.

I don't know how useful all of this really is to web service folks at large. I know just from personal experience that DOM code is a pain in the neck to learn, and ugly to look at. I needed something clean and concise to build my web service.

The funny thing is that I don't even feel I've sacrificed any flexibility with the restrictions I'm imposing on XML. Rather, I feel I'm cutting off tons of fat; XML allows for doing things in a variety of different ways. We only really need one way to do each thing.

Here is my proposal for TreeXML:

TreeXML is a subset of XML, and it is designed to be compatible with existing XML parsers. A number of restrictions are placed on the schema to allow a TreeXML parser to convert into a native data structure in any language that supports dictionaries and lists (it could be done with only lists, as well).

This requires a few restrictions, namely:

- Attributes are not allowed.
- Text nodes may only appear at the lowest level of the tree ("leaves").

Each node may contain either a string or a list of child nodes. That's it.

Find out more and get source code on my TreeXML page.

Python & TreeXML

So about a half hour into my epic adventure to remake XML to fit better amidst my lovely Pythonic code, I find that the solution is all too simple. (at least, it works so far... I won't guarantee that it's bug free yet!)

I chose to make each dictionary entry always lead to a list, instead of leading directly to another node when there was only one tag with that name. I felt that while this doesn't make simple code simpler, it does make it much simpler to deal with lists. Real improvement beyond this point would come in the form of hacking the language to do what I want to do, and that's usually counterproductive (and I don't have that much in-depth experience with Python, either).

from xml.dom import minidom

def __parse_node(node):
    d = {}
    if node.childNodes[0].nodeType == 3:
        return node.childNodes[0].data
    for child in node.childNodes:
        name = child.localName
        value = __parse_node(child)
        if name not in d:
            d[name] = [value]
        else:
            d[name].append(value)
    return dict(d)

def parse(xml):
    doc = minidom.parseString(xml)
    return __parse_node(doc)

xml = ('<?xml version="1.0" ?><response><status><type>OKAY</type>' + 
       '<desc>Everything is just fine.</desc>' +
       '</status><time>2:08 AM</time><time>5:43 PM</time></response>')
doc = parse(xml)
print doc
print doc['response'][0]['status'][0]['desc'][0]
print 'times: %s' % ', '.join(doc['response'][0]['time'])

This yields the following output:

{u'response': [{u'status': [{u'type': [u'OKAY'], u'desc': [u'Everything is just fine.']}], 
u'time': [u'2:08 AM', u'5:43 PM']}]}
Everything is just fine.
times: 2:08 AM, 5:43 PM

Now it's time to write the XML-encoder to reverse the process.

XML and Python

For a recent project, I've been trying to integrate XML messaging into a Python gateway for a web service. I've been finding that the marriage of the two is not very Pythonic.

The problem is that I don't really like having code lying around that looks like this (exaggerated a little):

doc.getElementsByTagName('response')[0].getElementsByTagName('status')[0].getElementsByTagName('desc')[0].firstChild.data

That is obscene. This is the 21st century. And I'm using Python, not COBOL.

I want to use:

doc['response']['status']['desc']

to do the same thing. I think that with a little restriction on XML, that can be done.

The idea is to build a tree from Python dictionaries, lists, and strings. Attributes are not permitted. Sibling nodes with the same tag are permitted; this feature is too valuable to discard, though it does break the pure tree model.

Here is some sample XML and the tree I would build from it:

<?xml version="1.0" ?>
<response>
  <status>
    <type>OKAY</type>
    <desc>Everything is just fine.</desc>
  </status>
  <time>5:43 PM</time>
</response>
{'response': {'status': {'type': 'OKAY', 'desc': 'Everything is just fine.'}, 'time':'5:43 PM'}}