By Eoghan Page
XML is one of those standard tools most programmers come across at some point in their careers, education or training. XML usually has its character encoding listed on the first line, so given an XML document anyone can develop their own or use open source tools to parse it into what they need. While some developers might laud the newer and shinier JSON as more readable and in many cases faster to parse (myself included), we can’t escape the fact that XML is ubiquitous and will continue to be for quite some time. As I’ve said, there are many solutions and ways to attack parsing it, but when we need to generate XML outputs to a standard we are often left to fend for ourselves.
In recent months I’ve developed a solution to generate FpML (Financial products Markup Language) for some interest rate derivatives, specifically interest rate swaps, tenor basis swaps and forward rate agreements.
Originally, I had done a similar task in Python. This used some open source libraries to parse the XML schema documents (.xsd files) associated with the FpML standard. This would then generate a structure of binding classes which determined tags by inheritance and would create their related XML with a simple function call. Since the data was being generated in kdb+, and prone to change with little notice, it hit a point where the Python solution was growing into an unmaintainable giant that added a lot of overhead to the project as a whole. I had started writing functions to wrap individual fields in tags, decided to eliminate IPC overhead by reading the data from CSVs, and had a hard time getting meaningful and clear error handling. It wasn’t a huge leap for me to then think I’d be better off doing it in kdb+ from scratch.
Going from a kdb+ table into an XML schema is something that has its own obstacles and merits. On the one hand, we’ve got the advantage of working in ASCII, so trivially anything we output will be UTF-8 compatible. On the other, we’ve got that two-dimensional way of thinking of data – a table with rows and columns. XML requires we cast that aside. We can have numerous nested layers, abstract typing, and attributes which don’t fit in to the paradigm of rows and columns. It falls to us then to wrap individual entries in their own appropriate tags, and have control of how we nest it. I started out from an example on the Kx wiki, and ended up with the below function at the core of my implementation.
Fig 1. Central function that wraps an input string in XML tags.
This also makes use of a brief function, .fpml.isEmpty, to check if we have data to wrap – i.e. if we need to output anything at all.
Fig 2. .fpml.isEmpty checks if we’ve got a null type or a string of count 0, which can be a tricky one.
The more observant reader might have noticed already that the .fpml.tagit function won’t behave with more than two layers of nesting, and attributes are only ever applied to the innermost layer. For the implementation required this was adequate, and forced the larger and more complex structures to be broken into more manageable chunks that could be generated one at a time. Moving forward into other asset classes or XML schemas, this will be the first thing to improve, as it underpins the whole implementation.
Since that’s our fundamental block, it’s now more useful to start at the top and work inwards. The wrapper function .fpml.main takes a kdb+ table with the required naming conventions and opens a handle to a destination file, for example, the file fpml.xml in the QHOME directory. After printing out a header, we can iterate over each row of the table with a function that determines the trade type and applies an appropriate subroutine that’ll wrap up a relevant subset of columns into the appropriate parent tag.
Fig 3. The wrapper function used, which opens our filehandle and writes out a header before iterating through trades.
Fig 4. Our trade level decisions are to return a relevant header and determine the type of trade we’re using.
Below the level shown in .fpml.swapLeg in Fig. 5, use of .fpml.tagit or a projection thereof is used to construct the sections of the overall trade that the functions correspond to. As you can see in the wrapper, we just append all of these to the same file, and repeat until we’re finished.
Fig 5. Starting to drill down further, we use some conditionals to decide the appropriate trade type and enter into more specific functions until we’re generating blocks of FpML.
The FpML standard itself is a large and comprehensive system for representing financial instruments across a variety of asset classes. There will always be scope to extend and improve upon this implementation, but after construction of a function at the core which will generate our tags, the rest follows logically and you’re just filling in blanks. Putting thought into your design process and creation of a mapping from your own schema to the XML standard you’re working with is essential to speed up the development of a project like this. Without my prior experience working with FpML in Python the planning behind this could have taken a lot longer, sifting through the standard’s documentation. That said, our generation time and resource usage have been reduced by a factor of ten since the migration. This is an undertaking that may seem daunting at the outset but the gains of keeping it in kdb+ are definitely worth it.
Eoghan Page (firstname.lastname@example.org) is a kdb+ developer working for Kx and based in London. He has a strong preference for functional programming and has coding experienced with q/k, Python, C/C++, R, MATLAB, CLISP, and some other programming languages.
All code referenced above is available here, along with a small sample of mocked-up data and example usage. Screenshots above were taken using Analyst for Kx.