September 08, 2003
Pathan 2.0 alpha provides XPath 2.0 implementation
DecisionSoft has released an alpha version of Pathan 2.0 that implements XPath 2.0.
At the same time they also released Pathan 1.2 release 2 that updates it to use Xerces 2.3 and fixes a few bugs. This release is important to users of Berkeley DB XML.
September 02, 2003
Excelsior
Michael Tsai has posted a pointer to Excelsior!, an XML data binding tool for Cocoa apps. It looks pretty decent and includes a mechanism that uses XPath style accessors to retrieve pieces of data (it's not actually XPath, just looks similar). This is a much needed tool for Cocoa applications.
August 27, 2003
Updated XQuery drafts
The W3C XML Query working group has released updates to five of the drafts that make up XQuery 1.0 and XPath 2.0.
- XML Path Language (XPath) 2.0
- XQuery 1.0: An XML Query Language
- XQuery 1.0 and XPath 2.0 Formal Semantics
- XML Query Use Cases
- XPath Requirements Version 2.0
If you're having trouble sleeping reading these should help you out.
August 26, 2003
XSL-T to convert ISO8601 date to RFC 822 date
This is a partial implementation of an XSL-T template to covert from an ISO8601 date to an RFC 822 date. It isn't exactly correct, but it can serve as a starting point if you really need such a thing. I'm just posting it in case it might be useful to someone. I was going to use it to generate an RSS 2.0 feed, but decided to abandon it and just use Dublin Core dates that are already in ISO8601 format. This depends on the use of the EXSLT date functions.
<!-- Converts a date in ISO 8601 format into a date in RFC 822 format -->
<xsl:template name="rfc822Date">
<xsl:param name="isoDate"/>
<xsl:value-of select="date:day-abbreviation($isoDate)"/>, <xsl:value-of select="date:day-in-month($isoDate)"/><xsl:text> </xsl:text>
<xsl:value-of select="date:month-abbreviation($isoDate)"/><xsl:text> </xsl:text>
<xsl:value-of select="date:year($isoDate)"/><xsl:text> </xsl:text>
<!-- the timezone offset is currently hardcoded needs to be fixed -->
<xsl:value-of select="date:hour-in-day($isoDate)"/>:<xsl:value-of select="date:minute-in-hour($isoDate)"/>:<xsl:value-of select="date:second-in-minute($isoDate)"/> -0700
</xsl:template>
August 24, 2003
libxslt Extension Functions in Python
This is just a note about writing extension functions for libxslt. The documentation on this is slim at best and the only example that comes with libxslt is minimal.
I'm working on a project where I wanted to run a piece of text through a textile processor and then insert the result into the result tree of the XSL document. I thought this would be pretty simple, but it turned out to be a bit more work then I expected. Here's the code that finally works.
Update: the original version of this code inserted the output directly into the tree. That's probably not what you want to really do, so I updated it to return the content.
def textile_process(ctx, content):
"""
An XSL-T extension function to process textile formatting included in a post.
"""
try:
node = libxml2.xmlNode(_obj=content[0])
parserContext = libxslt.xpathParserContext(_obj=ctx)
xpathContext = parserContext.context()
resultContext = xpathContext.transformContext()
source = "<div>" + textile.textile(node.content) + "</div>"
doc = libxml2.parseDoc(source)
root = doc.getRootElement()
root.unlinkNode()
# If you do this you insert the result directly into the output tree
# resultContext.insertNode().addChild(root)
return [root]
except Exception, err:
sys.stderr.write("Context error " + str(err))
return ""
The tricky parts about this code are converting all parameters into the right types and getting to the pieces of the documents that you need. The problem is that the parameters come in as raw PyCObject instances so you have to convert to the more specific object types manually.
For the way I was using this function the content parameter contains a list with one item and that item is an xmlNode. So the code.
node = libxml2.xmlNode(_obj=content[0])
converts the PyCObject into an xmlNode instance that you can then use as you usually would. You have to do similar things with the xpathParserContext that comes in as the first parameter. From the xpathParserContext you then have to get an xpathContext and from that you get the context for the result document. The xpathContext variable provides a reference into the source document and the resultContext variable provides access to the document being created.
Update: this paragraph applies to the commented out version of the original code. One confusing thing about the resultContext is the use of the insertNode() method. At first I thought that was inserting a new node into the result document, what it really is doing is requesting the node under which the result of the function will be inserted. It would probably have been clearer if the method was named getInsertionNode() or something like that.
Now that this is working it's pretty slick, but it sure took a while to figure out exactly what needed to happen.
August 21, 2003
libxml2 2.5.10 and libxslt 1.0.32 released
Daniel Veillard has released new versions of libxml2 and libxslt, classifying the libxml2 release as a "major bugfix release", and writing that libxml2 2.5.9 and 2.5.10 "include a lot of bugfixes spanning the whole library; upgrading is strongly recommended." The libxslt 1.0.32 release is also significant in that it is the first to include Python bindings for extension elements. [xmlhack]
I've been using libxml2 in Python a lot lately. It's the first XML parser I actually like, mainly because of the very convenient XPath API.
August 15, 2003
PostrgreSQL and XML
I just came across another project to add some XML support to PostgreSQL, xpsql. It looks like it's pretty rough at this point, but development is ongoing. There's also an older post about some different support, but I don't know if the code was ever released. I'm kind of surprised this type of thing has been so slow in coming.
August 14, 2003
A peek at X#
Linked from this blog entry is a paper that gives some insight into Microsoft's XML programming language that has been called X# by some people. It looks like they're doing more then just integrating XML into the language, they're also integrating relational database access. The paper's a very interesting read.
August 11, 2003
eXist 0.9.2 Released
Wolfgang Meier just announced the release of eXist 0.9.2. Here's the text of the announcement.
I'm pleased to announce that release 0.9.2 is now available on sourceforge.
For those who have not been able to follow the discussions on this list,
here's a quick summary of changes:This is the first official release with support for XUpdate. Also, much effort
has been invested to ensure that other character encodings than Latin 1 are
correctly processed by the database as well as the query engine. This applies
in particular to East Asian languages and scripts. Further changes include:
important missing parts of the XPath spec have been implemented, more
synchronization and database corruption issues have been addressed,
interfaces improved, and dozens of bugs fixed.
August 10, 2003
PyObjC gearing up for 1.0
This week Ronald Oussoren posted the following on the PyObjC mailing list: I think it is about time to do a 1.0 release. All bugs I know of have been fixed, and the end-user documentation is good enough. That doesn't... [Artima Python Buzz] Excellent! PyObjC is a very cool project. Especially since I like the XML support in Python so much better then what's available in Objective C.Namespace training wheels
The namespaces in XML debate just never dies, Jon Udell has a new take on his perspective at InfoWorld.com.
August 09, 2003
Installing Berkeley DB XML on Mac OS X with Python and Perl API support
I just wanted to post some notes about installing Sleepycat Berkeley DB XML on Mac OS X 10.2 with Perl and Python support. The builds are relatively straight forward and Sleepycat has posted a simple script to help build Berkeley DB XML it self. However, it isn't clear what is necessary to get Perl and Python working.
The most important thing, before you start compiling anything, make sure you have the latest GCC 3.3 from Apple. This is distributed as a patch to the December 2002 developer tools. This is critical, without it Python and Perl support will not work.
Next, unfortunately, you'll have to build a new Perl and Python. The Mac OS X 10.2 Python should be the right version, but I couldn't get it to work. Building a fresh Python 2.3 does work. For Perl, Mac OS X includes Perl 5.6 and Berkeley DB XML requires 5.6.1 so you have to build a new one. I used Perl 5.8.0 and it seems to work fine. So you have to build a new Python, a new Perl and the Berkeley DB XML distribution. These should all build using the standard instructions and for DB XML you can use their script.
Once you have all that built, you can then build the DB XML Perl and Python libraries.
For Python you first need to build and install bsddb3, once that's done you can build the python support for DB XML in the usual Python fashion. Make sure the python you're using is the one you built previously. Unless you specified otherwise, it's installed in /usr/local/bin/python.
cd dbxml-1.1.0/src/python
/usr/local/bin/python setup.py build
sudo /usr/local/bin/python setup.py install
There's an example Python program in dbxml-1.1.0/examples/python/examples.py that you can run to test the build.
For Perl you just build it in the usual Perl manner. Again, make sure you use the perl you compiled.
cd dbxml-1.1.0/src/perl
/usr/local/bin/perl Makefile.PL
make
sudo make install
There are some examples for the Perl API in dbxml-1.1.0/src/perl/examples.
XML Document Construction With Python and libxml2
The libxml Python API is very lightly documented, so this is an attempt to fill in some of the holes that exist.
Creating a new document
To create a new document using the libxml2 API you use a document constructor function that returns an empty document instance. This method takes one argument that repesents the XML version of the document being created.
import libxml2
doc = libxml2.newDoc("1.0")
Creating elements
Once you have a document instance you then need to add elements to it. First off you need to create the root element.
root = doc.newChild(None, "root-element", None)
The root node is created using the xmlDoc.newChild() method. This method takes three parameters.
- namespace - The namespace that the element should belong to or
Noneif no namespace. - node name - The name of the node with no namespace prefix.
- element content - The content for the element or
Noneif the element is empty.
In this particular case we're creating an empty element named root-element. If we were to print this out at this point it would look something like this.
<?xml version="1.0"?> <root-element/>
If we wanted to put the node into a namespace we would write this instead.
root = doc.newChild(None, "root-element", None)
namespace = root.newNs("http://example.com/sample", "sample")
root.setNs(namespace)
The resulting document then becomes.
<?xml version="1.0"?> <sample:root-element xmlns:sample="http://example.com/sample"/>
Now that we've created the root we can continue adding elements to the document. We can add a element child-node in the http://example.com/sample namespace by adding.
child = root.newChild(namespace, "child-node", None)
And our document now looks like
<?xml version="1.0"?>
<sample:root-element xmlns:sample="http://example.com/sample">
<sample:child-node/>
</sample:root-element>
If we had wanted to included some text within the added child it's as simple as just changing the third parameter to newChild.
child = root.newChild(namespace, "child-node", "Some sample text")
Which generates the document
<?xml version="1.0"?>
<sample:root-element xmlns:sample="http://example.com/sample">
<sample:child-node>Some sample text</sample:child-node>
</sample:root-element>
Adding an attribute to an element is also very easy.
child = root.newChild(namespace, "child-node", "Some sample text")
child.setProp("an-attribute", "with a value")
Which of course generates a document that looks like this.
<?xml version="1.0"?>
<sample:root-element xmlns:sample="http://example.com/sample">
<sample:child-node an-attribute="with a value">Some sample text</sample:child-node>
</sample:root-element>
If you wanted the attribute to be part of a namespace, you use setNsProp instead of setProp.
child = root.newChild(namespace, "child-node", "Some sample text") child.setNsProp(namespace, "an-attribute", "with a value")
And the result
<?xml version="1.0"?>
<sample:root-element xmlns:sample="http://example.com/sample">
<sample:child-node sample:an-attribute="with a value">Some sample text</sample:child-node>
</sample:root-element>
Beside simple elements and attributes libxml defines methods to create all the other common XML types. Here's a summary of the methods that are available.
xmlDoc.newDocComment(comment)- Creates a comment node.xmlDoc.newCDataBlock(content, length)- Create a CDATA section.xmlDoc.newDocText(content)- Creates a new text node.
These methods are all node construction methods that are called to create the instance of the required type. Once you have the instance you then need to add it into the document tree where ever you want it. There's also a function available to create processing instructions. This function differs in that it called on the libxml2 module, rather then an xmlDoc instance.
libxml2.newPI (name, content)- Creates a processing instruction
Since these functions require you to create the node and then add it to the document in two steps, libxml provides a number of methods to control where the node is placed in the document tree. These methods are available on any instance of an xmlNode.
xmlNode.addChild(node)- Appends the new node to the list of children for the node.xmlNode.addChildList(nodeList)- Appends a list of new nodes to the children for the node.xmlNode.addNextSibling(node)- Adds the new node as a sibling after the selected node.xmlNode.addPrevSibling(node)- Adds the new node as a sibling before the selected node.xmlNode.addSibling(node)- Adds the new node as a sibling after the selected node. (similar to addNextSibling)xmlNode.addContent(content)- Appends additional text content to an element.
Here's an example that puts everything together.
#!/usr/local/bin/python
import libxml2
doc = libxml2.newDoc("1.0")
root = doc.newChild(None, "root-element", None)
namespace = root.newNs("http://example.com/sample", "sample")
root.setNs(namespace)
child = root.newChild(namespace, "child-node", "Some sample text")
child.setNsProp(namespace, "an-attribute", "with a value")
comment = doc.newDocComment("Just commenting")
child.addPrevSibling(comment)
pi = libxml2.newPI("a-sample-pi", "with some useless content")
root.addPrevSibling(pi)
text = doc.newDocText(" This will be added to the existing text.")
child.addChild(text)
child.addContent(" This will also be added to the text")
print doc.serialize(None, 1)
And a final result.
<?xml version="1.0"?> <?a-sample-pi with some useless content?> <sample:root-element xmlns:sample="http://example.com/sample"> <!--Just commenting--> <sample:child-node sample:an-attribute="with a value">Some sample text This will be added to the existing text. This will also be added to the text</sample:child-node> </sample:root-element>
August 06, 2003
Saxon does XQuery
i just discovered that Saxon now includes support for XQuery. Probably old news for people in the XSL community, but news to me. I'm not much of a fan of XQuery, but it looks like Saxon will finally bring a usable implementation to play with. There's a discussion starting about using it to implement XQuery in Xindice. Unfortunately, it's under the Mozilla Public License which will probably kill that idea.
August 04, 2003
Xindice Wiki
Just discovered there's a Wiki setup for the Xindice project. That's a very good thing to see as one of the major problems with the Apache projects is that it's too much trouble to update the web site regularly and this leads to stale sites. The Xindice web site is a prime example of this. I was looking at it the other day and noticed it doesn't even have any links to where you can download Xindice 1.1. The problem is it's a multistep process, you have to write the content in XML using Forrest, build the site, make sure it's right and then copy it up to the server. It doesn't seem like that big of a deal, but for me it's enough that it got in the way of making as many updates as I'd like. It seems this has carried forward with the current committers as well.
The other major benefit of the Wiki is that anyone can edit it and this leads to incremental development of the content. There's no bottleneck waiting for someone to come in and publish things.
July 28, 2003
Xindice Project Web Site Statistics
I just noticed that there's a page showing interesting graphs of statistics for various Apache projects. The page for Xindice is interesting. Looks like Xindice is downloaded roughly 7,000 times per month. I'm sure this was probably posted on the mailing list at some point, but I completely missed it. It's interesting to compare it to the stats I posted shortly after the dbXML project was moved to the ASF to become Xindice.
July 23, 2003
Computerworld on Native XML Databases
Computerworld has an article looking for success stories in using native XML databases.
I'm also interested in this topic. In particular I'm interested in success stories using any of the Open Source native XML databases. Particularly, Sleepycat Berkeley DB XML, Xindice and eXist. If you have any please feel free to contact me. I'm looking at a number of writing projects coming up in the future that revolve around this topic.
July 11, 2003
Xindice Project Needs Help
In order to get a final Xindice 1.1 release out the Xindice project needs more help.
In particular someone is needed to complete the work on building a standalone distribution that uses Jetty as the container. Also extremely important, the documentation needs major updates to account for the significant changes that have been introduced since the 1.0 release.
Xindice 1.1b2 Released
Xindice 1.1 beta 2 has been released.
Just the next step to finally getting a 1.1 stable release out.
July 09, 2003
I've received a great "honor"
Mike Champion was kind enough to let me know that I've received the great honor of being debunked by the great Fabian Pascal. What's funny is that the email he's picking on was written at least two years ago, maybe even longer. I'm sure much of what he says is true, or maybe not who knows I just laughed when I read it. I have no idea why he feels that a mailing list posting from a couple years ago is worthy of his time.
July 04, 2003
LDAP Exhaustion and XQuery Lamenting
Spent the day today in a data center installing a couple LDAP servers for a client. It's been a while since I've spent any amount of time inside a data center and I'd forgotten how exhausting it can be. The noise and the cold air blasting through the floor really takes it out of you.
LDAP is an interesting technology, it's what sparked my interest in semi-structured data which led me to working on native XML databases. I hadn't worked with it in quite a few years, but it's a good example of a stable standard. In fact the LDAP protocol it self hasn't changed at all in that time period. The products have of course matured, but it's all still the exact same concepts and at the low levels the details are the same. A refreshing change compared to the spec a week mess that XML has become.
Fortunately the core specs in XML (XML 1.0, Namespaces, XPath 1.0, XSL-T 1.0) have now proven to be stable and I suspect we'll start to see the weaker (and much more complex) later specs beginning to drop off the radar. It's too bad, but in a lot of ways just about everything after the release of XSL-T 1.0 seems pretty irrelevant. This isn't altogether a good thing. The current XPath 1.0 and XSL-T 1.0 specs definitely have room for improvement. I just wonder if XPath 2.0 and XSL-T 2.0 are going to provide that improvement without drowning under the added complexity that being associated with XQuery has introduced.
I've gotten to the point where I don't even pay much attention to XQuery anymore, maybe that's not a good thing, but I just don't see any real world interest in it. In particular, it's pretty much non-existent in the Open Source world. None of the big three Open Source XML databases (eXist, Xindice, Sleepycat DbXML) support it, and I kind of doubt that they ever will. The sad reality is that It's just not asked for all that often.
Once upon a time there was a real need for XQuery(or at least there was for XPath with joins and updates), but in the pursuit for academic perfection the complexity has mounted, the number of associated specs has mushroomed and the spec has delayed it self into irrelevance.
Why am I writing about this? I don't know, maybe I just wish that XML databases actually mattered anymore. There was an awful lot of waiting brought out by the presence of XQuery. It's provided a big cloud to hang over the whole XML database arena. Instead of focusing on doing profitable work with XPath and just adding the missing pieces, innovation stopped and everyone delayed all their plans around the development of XQuery. Now, several years later XQuery still isn't finished and products are finally shipping with incomplete XQuery implementations with proprietary extensions for things like updates. This stuff should have been added years ago and now we have delayed products shipping with implementations that aren't going to interoperate anyway. So what have we gained? In my opinion, not much. In fact I think XQuery may end up killing the entire XML database market.
June 28, 2003
New Xindice 1.1 Build Available
I've been virtually silent about Xindice lately, but Kevin Ross has stepped up as a new leader within the project and has just announced the release of a new Xindice build. This is a 1.1 build and Kevin seems intent on getting a 1.1 release out, something I'm very happy to hear. My original plans called for Xindice 1.1 to be released almost 1.5 years ago. The project has struggled a lot since then and my dropping in and out of it hasn't helped much. I did manage to get a 1.1 beta build out a couple months ago, but again had to drop out of the project before making any more progress.
Xindice definitely needs some new blood among its developers. Kevin has stepped up as a new leader, but he needs a lot more help.
Panther and BeFS
Now this could be the biggest productivity boost of all. I would absolutely love to see a more BeFS like file system in Mac OS X. There is so much power that can be brought out when the file system is more database like. It enables all kinds of applications that require tons of extra software currently. For instance much of the functionality of iTunes could be built directly on top of the file system rather then using a separate database. This doesn't mean that iTunes would be replaced by the Finder, it just means the implementation of these meta-data heavy applications would be considerably simplified.
The problem of course is portability of the meta-data. It would bring back the whole resource fork problem where you have to specially handle files that move across platforms. That is definitely not a good thing. Because of this what I'd also like to see is an open XML based format for the transfer of files along with their meta-data. It still means special handling, but at least it would be in a format that would actually be useful someplace other then on a Mac. Maybe a bundle format (i.e. a directory) with the file in its usual form and the metadata in a separate file. For transfer simply zipping it up would work and then PC users could just unzip it and access the file inside or use the meta-data for what ever.
Echo Project
Anyone who's interested in blogging tools may want to pay attention to the Echo project. From the site "The EchoProject is an initative to develop a common syntax for syndication, archiving and an editing API". It's an effort to consolidate all the RSS, Blogger API, MetaWebLog API, whatever mess into something more manageable.
Tim Bray is wondering if it's going to go off the rails. In my original comment on this, I expressed the same concern. It's the second system syndrome, developers just can't resist it.
If nothing else Echo is an interesting experiment in community oriented specification.
June 27, 2003
Artima Buzz Site
Yet another interesting use of RSS for building new applications. The Artima buzz site allows you to register your RSS feed for a particular category if it's something you post about often. I registered for the Mac OS X buzz since it seems that's what I mostly post about now.
May 10, 2003
Editors' Newswire for 8 May, 2003
Newswire stories, including: Dave Pawson on XPath 2.0. [xmlhack]
Lots of complaining about the infection of XPath 2.0 by the XQuery working group. I fear we're going to be stuck with XML 1.0, XPath 1.0 and XSL-T 1.0 as everything coming after is to put it simply, scary.
I guess I'm adopting the perspective that as soon as W3C XML Schema touches something, it's doomed to whither on the vine. This is really too bad, XML is a tremendous tool, but its benefit is supposed to be simplicity. Using anything that W3C XML Schema has touched is just not simple.
Well formed XML is the useful core. You shouldn't have to pull in all that schema machinery to run an XPath against a well formed document. It will just bloat the size of implementations, books and headaches for little benefit. Schema support could have easily been partitioned out of the core of XPath 2.0 where it could fester and die from lack of real interest. Now the whole of XPath 2.0 will suffer that ignominious fate until someone defines an interoperable subset that jettisons all the schema garbage.
W3C XML Schema: Just Say NO!
May 07, 2003
XML query specs edge closer to completion
They would allow collections of XML files on the Web to be queried like databases. [Computerworld XML News]
TEN! TEN!!!! TEN!!!!!!! Working drafts for XQuery/XPath 2.0. Man it just keeps getting worse. I'm curious does anyone really care about XQuery anymore? It certainly could be useful, but who is going to be able to implement it? Ugh, anyway, I'm glad to see they're adding full text, but it looks like updates are still missing. Oh well, it doesn't matter now.
March 27, 2003
Why XML Doesn't Suck
Recently in this space I complained that
XML is too hard for
programmers.
That article got Slashdotted
and was subsequently read by over thirty thousand people; I got a
lot of feedback, quite a bit of it intelligent and thought-provoking.
This note will argue that XML doesn't suck, and discuss some of the issues
around the difficulties encountered by programmers.... [ongoing]
Another excellent bit of insight from Tim Bray.
My opinion. XML is awesome, and XML sucks. I also firmly believe it's the best path to lead us out of the proprietary system darkness. That is enough to keep me an XML believer. The potential here is huge, we just have to find our way to the light.
March 07, 2003
XinCJ - C++ Inteface to Xindice
Hauke von Bremen sent me a link to XinCJ which is his C++ interface to Xindice. Looks like he mirrored the XML:DB API into C++.
February 22, 2003
JaxMe 1.53 with XML:DB API support
JaxMe 1.53 released with XML:DB API support. [xmlhack]
Cool, another XML:DB API implementation.
Latent Semantic Indexing
Came across an interesting article on O'Reilly network Building a Vector Space Search Engine in Perl that led me to an even more interesting paper on Latent Semantic Indexing. Fascinating stuff and could be very useful for providing full text indexing of XML data where you could add the XPath to a node as another dimension in the index. Something certainly worth exploring.
