Mar 19, 2019

How to implement routing with n-ary tree

902 Words

2019-03-19 12:00 +0100

Routing mechanism is one of the main parts of a web service. Here I provide one approach to achieve this. This implementation is based on n-ary tree and tree search.

Many ways are available to handle routes, from regex to n-ary trees to parse registered route one by one.

Different ways

N-ary Trees

In this approach we can consider the root node as / and all children nodes as route parts are separated by /. Every part is on a separated level of the tree. A node contains one route part.

A route part can be either a string which requires full match or a path param which is a string surrounded by curly brackets ({}) also acceptable.

Regex

All defined routes go into a stack in the order of definition. During procession the earlier match is executed or returns the result of the executed lambda.

Parse one by one

This approach is very similar to the above one but without regex. Let us consider a registered route as a key in a dictionary. The value is the lambda which is intended to be executed when call falls on the route.

After iterating over the keys of the dictionary either use regex to identify the appropriate route or parse them and process by using string operations (for those who are not into regex).

Rule objects

Tornado framework provides a fine-grained mechanism to define and implement routings. The framework introduced rules and delegators parse and execute expressions.

The Approach

In this solution the n-ary tree approach is chosen. Each node contains one part of the route, depth of the tree equals to the longest route splitted by /. All children of a node contain the next route part after the node value separated by /.

Register a route

For route registration we can use the register function. It takes the route itself, which will be parsed, splitted and considered as an identifier part. The whole identifier of a node is the path inside the tree from the root to the given node. Method is also part of the identifier that means the same route with different methods are placed in two different nodes as siblings. params are considered as a whitelist of queryparams. Those queryparams that are not in this list, those are not parsed and processed, and passed to the lambda.

routing.register(route='/users', 
                 callback=(lambda name, email: users.create_user(name, email)),
                 method=routing.Methods.POST,
                 params={'name', 'email'})

routing.register(route='/users/{id}', 
                 callback=(lambda id: users.get_user(id)), 
                 method=routing.Methods.GET)

Process a route

The process function takes the three agruments as shown below and returns the result of the executed lambda. If lambda returns void than process gives back None.

request_body = routing.process(event.get('path'), 
                               event.get('queryStringParameters'), 
                               event.get('httpMethod'))

Routing Tree

Routing tree represents the registered routes. Each node contains the route part, the full path from the root to itself, callbacks and children nodes.

Callbacks are stored in a dictionary with methods as keys and lambdas as values. So this way one path could have a POST and a GET or any other methods at the same time.

A node looks like as follows:

(value="/" resource_path='/' callbacks={} children: [
  (value="users" resource_path='/users' callbacks={'GET': {callback=
  lambda-expression,
  params={}}}  children: [])
])

The routing tree has two methods that can build the tree up or find the appropriate path.

add: responsible to handle the registration recursively.
find: responsible for evaluation of the call.

Both methods traverse the tree recursively, finding matching parts. If parts found, and no other path left, it returns the appropriate value, or it goes towards to one of the children if necessary.

Add route with pathparam

Pathparam is also part of the representation so it goes as a node value, like {id}. This implementation requires that no multiple pathparam names are allowed. That means no {id} and {user_id} are allowed at the same time.

Process a call

Routing trees find method is responsible for getting back the lambda call based on the route path and the method. If one of them are differ from what the tree contains, find raise an error accordingly.

At first, find checks wether the path on a given node could match. If so, it traverse through children searching for exact matching. If no matching found on the level, it considers the path as a pathparam and goes towards the children of the pathparam node.

Validation

register as well as process have their own validators against injections and malicious characters that can mess up with the database.

Here I used very simple validation by using regex, but it is alterable according to the needs.

PATH_PARAM_PATTERN = '^{([a-zA-Z0-9]+)}$'
PATH_PART_PATTERN = '^[a-zA-Z0-9\-]+$'

QUERY_KEY_PATTERN = '^[a-zA-Z0-9]+$'
QUERY_VALUE_PATTERN = '(^[a-zA-Z0-9\-@\. ]+$)'

The implementation gives a chance to fine-tune the validation by introducing param specific patterns. These could be placed in either validate_route_register or validate_route_process or in both.

What it does not contain

It is an experimental project so it lacks some other features. The implementation of these features does not require much alteration on the code, though I leave it for others. The list below.

Process request body and send data as JSON
Optional params for PATCH
Validate params to avoid updating id
Context param of handler func
Builder for response (which can be found in my other post)

In this post we saw one approach of implementing routing in python, how to declare route with method, params and lambda, and took a look on how to process it.

For further reading and for the implementation you can visit https://github.com/torokmark/routing_python.