By Steve Wilcockson
Avi Palley, Lead Quantitative Strategist in eFX Trading at Wells Fargo, was recently interviewed by Carlos Zendejas, also a one-time Quant and now CEO and co-founder of Digital Q, in the Deep Q Cast series about building AI-Ready trading systems. Their conversation is compelling and passionate, two quants bringing to life the changing roles of quant amidst the excitement of automation, AI and machine learning. Avi is an industry veteran and renowned programmer with considerable experience in system development, architecture and maintenance courtesy of long careers at Galaxy Digital and Merrill Lynch Bank of America.
Avi outlined the evolution of quantitative analytics in finance, tracing it from the early days of structured finance, into areas like derivatives pricing, and the later effect of regulation and collateralization. Today the discipline is far less about, say, options pricing, which is often just a function call away, but much more about their use and business impact, as with any assets.
The role of quants too has changed. He describes three quant types: those that love mathematics, enjoy modeling and go deep into machine learning; quant developers and coders, less concerned with underlying “quant” but knowledgeable of them and great at implementing quant, and business-focused types; communicators who bridge the other types and present commercial perspectives.
As someone who has worked with quants since the discipline’s explosion in the nineties, I recognize all three types Avi describes. When their personalities shine through, they’re amazing people. A “business” type quant fund manager – albeit one with strong maths and code – I once worked with happened to be a synth player for a brilliant goth synth band that supported Depeche Mode in the nineties. I bought his band’s tapes and CDs, and was starstruck when I met him! A developer focussed quant now runs the IT stack at a major university’s library, revelling in his geekdom. When we meet up, we share stories of books and code. And the French quants whatever their fit – more often than not offer brilliant discussions about philosophy, art and cinema.
But I digress. As Avi explains, all three are subject to the criticality of data and I completely agree (I’d also add the model matters too). He and Carlos discussed the importance of data cleanliness, integrity, tooling, consistency and the workflow managing the data lifecycle. For AI systems in particular – data-driven models rather than traditional quant closed-form models – data matters even more. As Avi stated succinctly, approach data from first principles:
“Build data into your trading systems, or at least think about proper data from day one. It’s not something that you decide [on] six months down the road”
When organizations fail in forward thinking, they unwittingly develop Rube Goldberg/Heath Robinson type arrangements that incur significant technical debt. They also fail to address how the data will actually be processed and consumed over time to deliver business value. To mitigate such risks, Avi and Carlos discussed such challenges faced in developing next-generation systems and offered practical approaches to address them, which I’ve aggregated into five steps:
Step 1: Accommodate Change but Retain Control
Change never changes. Accommodate it from the outset. In architectural terms, start generic and abstract and minimize early stage hard coding in areas like schemas. Instead, set a framework that allows schemas to adapt without having to rebuild entire datasets and avoid tight coupling to, say, its visualisation to ensure future flexibility. In practical terms, present well-defined API’s with parameterization that support controlled bespoke development but are resilient to underlying system changes. Well-engineered APIs, though encapsulation, shield uses from data imperfections that may require, for example, very specific aggregations, or catering for known imperfections in the data set. As someone who once constructed a time-stamped but highly inconsistent “alternative” data-set, I recognize that challenge.
Step 2: Deliver Early and Deliver Often
The step speaks for itself but good design should never be at the cost of delivery. Don’t be over-prescriptive as different problems require different approaches. Be sufficiently pragmatic to understand issues arise that may unfound best plans, but always have – and evolve – a process and a framework. As Carlos appositely quotes Mike Tyson, “Everyone’s got a plan until they get punched in the mouth”. View things strategically and tactically (also read commercially), with the strategic vision have an end-point, but take iterative steps which entail delivering value to the business along on the way. Yes, that’s important beyond the data realm too, but from a data software standpoint, it’s great to motivate and build positive feedback loops with DevOps and, if applicable, MLOps processes.
Step 3: Use the Right Tools.
We are awash with exciting new technologies. Many promise the world but may not always deliver – how many natively support time series data for example? There remains the attraction (or is that distraction?) of new approaches and open-source initiatives that advertise well on the developer community sites but may not have supporting technical documentation or other support services to solve problems. Avi wryly notes:
“When you think about something as fundamental as data, you don’t want to be building your trading system on a house of cards.”
He cites, in contrast, “the proven nature of kdb as an example of a technology used by every major bank on Wall Street and virtually all buy-side institutions” where its native support for things like asof joining (for aligning across disparate data sets) and window joining (for grouping over time) can be performed instantly without complex parallelizing – speeding up both development time and runtime. Within and beyond these organizations, Avi states “kdb is the de facto in quant finance. You can’t show up for a job in quant finance and not know it.” Now, I’d argue that Python comes above it on the list, and I might add MATLAB with a twinkle in my eye having once been one, but for production analytics that matter, Avi is right that “kdb is the best in class, in fact it is probably the only in class”
Step 4: …. But Use the Right Tools Optimally
Using familiar tools is important, but using them optimally matters too. One-size-fits-all may be an attractive panacea to reducing both cost and risk in transferring data and models across and within systems, but are they the right tool for the right job? Not many doctors I know use Swiss Army Knives for surgery. Counterbalance risks and opportunities to deploy the known with the new. Avi notes it would be a mistake to say, for example, “I’m doing 80% of my work in Python, let me also do my data queries in Python” when it is clear that the aggregations mentioned above can be executed much faster and easier in kdb. Continuing the discussion Avi and Carlos have a fascinating discussion about programming language strengths and weaknesses, for example:
“Python is pretty much gold standard for data analysis, machine learning AI, and the amount of productivity that you get by using Python compared to other languages for something like that is great, especially if you’re using JupyterLab….. Your productivity is massive compared to … say Java or C++. We talked about kdb. So q is the language for kdb, really, really good at data aggregation….. asof joins, being able to do SELECT statements and doing them quick……. Striking that balance where you use things to their strengths, and then bridge them together, is what I think is very important.“
A theme I explore in my blogs elsewhere contrast simplicity and complexity in architectures. There’s no right answer. It’s an architect’s choice. I can squish my stack, or I can build an infrastructure that takes the best of many things to deliver a whole greater than the sum of the parts. On the flipside, the former might induce a jack of all trades master of none application or the latter a software spaghetti full of technical debt. However, with Python and kdb, get the best of both worlds. Python can drive collaborative model development and analytics pipeline with kdb the blazingly fast data and model queries. As Avi notes, “its treatment of data as a first class citizen is what makes kdb unique.”
Step 5: Respect Your Data. Don’t Torture it
“If you torture data long enough it will confess to anything“ and its variations get attributed to economist and Nobel prize winner Ronald Coase. Whatever its origin, it has become a denunciation cry for those inclined to the “lies, damned lies and statistics” critique, making the case that the average person has one-point something legs for example. It’s true, but not really true. The webinar finished on a similarly philosophical note:
“When you’re actually in practice, the way you make money is by handling esoteric problems and treating them with integrity from a data and modelling perspective. You can make data say whatever you want it to say, but when it comes to making money, unless your data is actually telling the truth, it’s not going to help you.“
To torture the data (and yourself), follow Heath Robinson. To find truth in your data, take the counsel of practitioners like Avi and advisors like Digital Q to help architect systems, tools and methodology unlock the real truth in your data. Finally. as Avi says, always trust a kdb developer: “all the people who know what they’re doing think in kdb.”
Click here to watch the interview in full.