Twisted.Web2 Object Traversal

  1. Object Traversal Basics
  2. locateChild in depth
  3. childFactory method
  4. child_* methods and attributes
  5. Dots in child names
  6. The default trailing slash handler
  7. IRequest.prepath and IRequest.postpath
  8. Conclusion

Object traversal is the process Twisted.Web2 uses to determine what object to use to render HTML for a particular URL. When an HTTP request comes in to the web server, the object publisher splits the URL into segments, and repeatedly calls methods which consume path segments and return objects which represent that path, until all segments have been consumed. At the core, the Web2 traversal API is very simple. However, it provides some higher level functionality layered on top of this to satisfy common use cases.

Object Traversal Basics

The root resource is the top-level object in the URL space; it conceptually represents the URI "/". The Twisted.Web2 object traversal and object publishing machinery uses only two methods to locate an object suitable for publishing and to generate the HTML from it; these methods are described in the interface twisted.web2.iweb.IResource:

class IResource(Interface):
  """
  I am a web resource.
  """

  def locateChild(request, segments):
    """Locate another object which can be adapted to IResource.

    return: A 2-tuple of (resource, remaining-path-segments),
                 or a deferred which will fire the above.
    
                 Causes the object publishing machinery to continue on
                 with specified resource and segments, calling the
                 appropriate method on the specified resource.

                 If you return (self, L{server.StopTraversal}), this
                 instructs web2 to immediately stop the lookup stage,
                 and switch to the rendering stage, leaving the
                 remaining path alone for your render function to
                 handle.
    """

  def renderHTTP(request):
    """Return an IResponse or a deferred which will fire an
    IResponse. This response will be written to the web browser
    which initiated the request.
    """

Let's examine what happens when object traversal occurs over a very simple root resource:

from twisted.web2 import iweb, http, stream

class SimpleRoot(object):
    implements(iweb.IResource)

    def locateChild(self, request, segments):
        return self, ()

    def renderHTTP(self, request):
        return http.Response(200, stream=stream.MemoryStream("Hello, world!"))

This resource, when passed as the root resource to server.Site or wsgi.createWSGIApplication, will immediately return itself, consuming all path segments. This means that for every URI a user visits on a web server which is serving this root resource, the text "Hello, world!" will be rendered. Let's examine the value of segments for various values of URI:

/foo/bar
  ('foo', 'bar')

/
  ('', )

/foo/bar/baz.html
  ('foo', 'bar', 'baz.html')

/foo/bar/directory/
  ('foo', 'bar', 'directory', '')
    

So we see that Web2 does nothing more than split the URI on the string '/' and pass these path segments to our application for consumption. Armed with these two methods alone, we already have enough information to write applications which service any form of URL imaginable in any way we wish. However, there are some common URL handling patterns which Twisted.Web2 provides higher level support for.

locateChild in depth

One common URL handling pattern involves parents which only know about their direct children. For example, a Directory object may only know about the contents of a single directory, but if it contains other directories, it does not know about the contents of them. Let's examine a simple Directory object which can provide directory listings and serves up objects for child directories and files:

from twisted.web2 import resource

class Directory(resource.Resource):
    def __init__(self, directory):
        self.directory = directory
    
    def renderHTTP(self, request):
        html = ['<ul>']
        for child in os.listdir(self.directory):
            fullpath = os.path.join(self.directory, child)
            if os.path.isdir(fullpath):
                child += '/'
            html.extend(['<li><a href="', child, '">', child, '</a></li>'])
            
        html.append('</ul>')
        html = stream.MemoryStream(''.join(html))
        return http.Response(200, stream=html)

    def locateChild(self, request, segments):
        name = segments[0]
        fullpath = os.path.join(self.directory, name)
        if not os.path.exists(fullpath):
            return None, () # 404

        if os.path.isdir(fullpath):
            return Directory(fullpath), segments[1:]
        if os.path.isfile(fullpath):
            return static.File(fullpath), segments[1:]

Because this implementation of locateChild only consumed one segment and returned the rest of them (segments[1:]), the object traversal process will continue by calling locateChild on the returned resource and passing the partially-consumed segments. In this way, a directory structure of any depth can be traversed, and directory listings or file contents can be rendered for any existing directories and files.

So, let us examine what happens when the URI "/foo/bar/baz.html" is traversed, where "foo" and "bar" are directories, and "baz.html" is a file.

  1. Directory('/').locateChild(request, ('foo', 'bar', 'baz.html')) - Returns Directory('/foo'), ('bar', 'baz.html')
  2. Directory('/foo').locateChild(request, ('bar', 'baz.html')) - Returns Directory('/foo/bar'), ('baz.html, )
  3. Directory('/foo/bar').locateChild(request, ('baz.html')) - Returns File('/foo/bar/baz.html'), ()

No more segments to be consumed; File('/foo/bar/baz.html').renderHTTP(ctx) is called, and the result is sent to the browser.

childFactory method

Consuming one URI segment at a time by checking to see if a requested resource exists and returning a new object is a very common pattern. Web2's default implementation of twisted.web2.iweb.IResource, twisted.web2.resource.Resource, contains an implementation of locateChild which provides more convenient hooks for implementing object traversal. One of these hooks is childFactory. Let us imagine for the sake of example that we wished to render a tree of dictionaries. Our data structure might look something like this:

tree = dict(
    one=dict(
        foo=None,
        bar=None),
    two=dict(
        baz=dict(
        quux=None)))

Given this data structure, the valid URIs would be:

Let us construct a twisted.web2.resource.Resource subclass which uses the default locateChild implementation and overrides the childFactory hook instead:

from twisted.web2 import http, resource, stream

class DictTree(resource.Resource):
    def __init__(self, dataDict):
        self.dataDict = dataDict

    def renderHTTP(self, request):
        if self.dataDict is None:
            content = "Leaf"
        else:
            html = ['<ul>']
            for key in self.dataDict.keys():
                html.extend(['<li><a href="', key, '">', key, '</a></li>'])
            html.append('</ul>')
            content = ''.join(html)

        return http.Response(200, stream=stream.MemoryStream(content))

    def childFactory(self, request, name):
        if name not in self.dataDict:
            return None # 404
        return DictTree(self.dataDict[name])

As you can see, the childFactory implementation is considerably shorter than the equivalent locateChild implementation would have been.

child_* methods and attributes

Often we may wish to have some hardcoded URLs which are not dynamically generated based on some data structure. For example, we might have an application which uses an external CSS stylesheet, an external JavaScript file, and a folder full of images. The twisted.web2.resource.ResourcelocateChild implementation provides a convenient way for us to express these relationships by using child_ prefixed methods:

from twisted.web2 import resource, http, static

class Linker(resource.Resource):
    def renderHTTP(self, request):
        page = """<html>
    <head>
      <link href="css" rel="stylesheet" />
      <script type="text/javascript" src="scripts" />
    <body>
      <img src="images/logo.png" />
    </body>
  </html>"""

        return http.Response(200, stream=stream.MemoryStream(page))

    def child_css(self, request):
        return static.File('/Users/dp/styles.css')

    def child_scripts(self, request):
        return static.File('/Users/dp/scripts.js')

    def child_images(self, request):
        return static.File('/Users/dp/images/')

One thing you may have noticed is that all of the examples so far have returned new object instances whenever they were implementing a traversal API. However, there is no reason these instances cannot be shared. One could for example return a global resource instance, an instance which was previously inserted in a dict, or lazily create and cache dynamic resource instances on the fly. The resource.ResourcelocateChild implementation also provides a convenient way to express that one global resource instance should always be used for a particular url, the child-prefixed attribute:

class FasterLinker(Linker):
    child_css = static.File('/Users/dp/styles.css')
    child_scripts = static.File('/Users/dp/scripts.js')
    child_images = static.File('/Users/dp/images/')

Dots in child names

When a URL contains dots, which is quite common in normal URLs, it is simple enough to handle these URL segments in locateChild or childFactory one of the passed segments will simply be a string containing a dot. However, it is notimmediately obvious how one would express a URL segment with a dot in it when using child-prefixed methods. The solution is really quite simple:

class DotChildren(resource.Resource):
    def render(self, request):
        return http.Response(200, stream="""

  
    
  
""")
    

If you only wish to add a child to specific instance of DotChildren then you should use the putChild method.

rsrc = DotChildren()
rsrc.putChild('child_scripts.js', static.File('/Users/dp/scripts.js'))
    

However if you wish to add a class attribute you can use setattr like so.

setattr(DotChildren, 'child_scripts.js', static.File('/Users/dp/scripts.js'))
    

The same technique could be used to install a child method with a dot in the name.

The default trailing slash handler

When a URI which is being handled ends in a slash, such as when the '/' URI is being rendered or when a directory-like URI is being rendered, the string '' appears in the path segments which will be traversed. Again, handling this case is trivial inside either locateChild or childFactory, but it may not be immediately obvious what child-prefixed method or attribute will be looked up. The method or attribute name which will be used is simply child with a single trailing underscore.

The resource.Resource class provides an implementation of this method which can work in two different ways. If the attribute addSlash is True, the default trailing slash handler will return self. In the case when addSlash is True, the default resource.Resource.renderHTTP implementation will simply perform a redirect which adds the missing slash to the URL.

The default trailing slash handler also returns self if addSlash is false, but emits a warning as it does so. This warning may become an exception at some point in the future.

IRequest.prepath and IRequest.postpath

During object traversal, it may be useful to discover which segments have already been handled and which segments are remaining to be handled. In locateChild the remaining segments are given as the second argument. However, since all object traversal APIs are also passed the request object, this information can also be obtained via the IRequest.prepath and IRequest.postpath attributes.

Conclusion

Twisted.web2 makes it easy to handle complex URL hierarchies. The most basic object traversal interface, twisted.web2.iweb.IResource.locateChild, provides powerful and flexible control over the entire object traversal process. Web2's canonical IResource implementation, resource.Resource, also includes the convenience hooks childFactory along with child-prefixed method and attribute semantics to simplify common use cases.

Index

Version: 8.1.0