Big time-series Data
Frequently asked questions.
kdb+ Overviewhide all +
- 1. What is kdb+?
kdb+ from Kx Systems is a high-performance, high volume database designed from the outset in anticipation of vast increases in data volumes. It is fully 64-bit, and has built-in multi-core processing and multi-threading. The same architecture is used for real-time and historical data. The database incorporates its own powerful query language, q, so that analytics can be run directly on the data.
- 2. Does kdb+ support multiple cores and/or multi-threading?
Yes, both are built right in to the system and this makes kdb+ extraordinarily fast when compared with traditional applications, as it can make full use of all available cores. Also, the application developer does not have to write any special thread-aware code to use these facilities.
The number of cores available to kdb+ is specified in a startup parameter, while kdb+ uses multi-threading in a number of different ways:
- parallel access to data partitions in large historical databases. The allocation of thread to disk partitions is user configurable at startup (typically 1-2 threads-per-physical-disk partition)
- multithreaded support for client queries: this can benefit when static data is served from an in-memory database
- parallel each: like Google's MapReduce, expensive algorithms over large lists can be broken up and the distinct data chunks processed in parallel and the results then recombined
- 3. What is meant by a single architecture?
Most data management/data analysis solutions divide the world into real-time data and historical data. This makes it easier for partial approaches to claim proficiency at one or the other, but there are several disadvantages to the division, such as:
- delays while real-time data is converted and written to the historical database
- more complex queries to address both types of data
- the cost in time and memory of marshaling from one format to another when passing data from one vendor application to another
- similar but different SQL dialects, each of which has to be tuned, optimized and debugged
In contrast, kdb+ has one architecture for both, minimizing latency and simplifying queries.
- 4. What other features contribute to kdb+ performance?
Kx has refined the architecture in a number of ways, based on close attention to key performance criteria:
- a columnar structure for the database simplifies indexing and joins, and dramatically speeds search performance
- publish and subscribe mechanisms offload processing from the main server onto chained servers, allowing data services to be provided to a virtually unlimited number of clients
- date, time, and timestamp (to nanosecond) are basic data types, making time-ordered analysis extremely fast
- it is optimized for bulk inserts and updates, and not just one-record-at-a time. Exchange data typically comes in blocks, and does not have to be written one by one. Also, the database does not need to be taken offline for bulk imports and exports
- kdb+ has dynamic indices, required to be able to make efficient use of real-time data
- 5. Is kdb+ just an in-memory database?
No. kdb+ provides a full relational database management system that handles data in memory as well as stored data on disk.
For advanced applications such as backtesting of auto trading strategies or operational risk management, it is essential to be able to compare real-time data against history. Approaches that handle in-memory data or historical data alone, or that try to combine a real-time or in-memory product from one vendor with a historical product from another cannot deliver the performance necessary for real-time business, because they have to cope with two separate architectures. Excess overhead is unavoidable with multiple architectures.
- 6. Is the q query language needed to use kdb+?
A basic knowledge of q is needed to set up and administer a kdb+ database. Our experience is that most developers find q easy to learn and incredibly productive — even those whose prior experience is solely with traditional programming languages. Since q includes a SQL-like subset, it is very familiar to database programmers, who can be up and running within days.
The query language allows powerful ad hoc queries on-the-fly. This is important as financial services firms are being faced with a huge number of ad hoc reporting requirements from the regulatory authorities, and this is putting significant pressure on their data infrastructures.
Also, while end users may access kdb+ from a GUI, and know nothing of the database back end, in practice, many actually prefer to work directly with q in a kdb+ session, making use of the ability to write their own custom queries.
- 7. Are there any benchmarks for kdb+?
The Securities Technology Analysis Center has a tick database benchmark, the STAC-M3. This has been run using kdb+ and typical queries on the equivalent of a year's NYSE TAQ data, and various hardware platforms. This series of benchmarks enables users and vendors to compare the performance of their database solutions against audited, third-party measurements.
kdb+ System Environment
- 1. Which platforms does kdb+ run on?
kdb+ runs on industry-standard 64-bit architectures running Linux, Windows, Solaris (SPARC and Intel), or Mac OS X.
- 2. How does kdb+ scale?
kdb+ is multi-threaded and multi-process, and can scale across many machines, to petabyte sizes. It will run on clusters, grids, clouds, and other large scale distributed architectures.
- 3. How is real-time data handled?
The kdb+tick module supports data streams from a data feed or other source of real-time data, and makes it available for immediate relational analysis. In addition, the data is logged so that, in case of a system failure, no data is lost. Periodically, the in-memory data is written to the historical database — a day's worth of real-time data can be written to the database in a few minutes. In fact, kdb+tick is so fast at managing real-time, in-memory, and stored data that some of our customers eliminate the traditional end of day where the database is taken off-line. Because of this, kdb+tick can be used for advanced applications such as global 24x7 trading.
- 4. Is it really necessary to save all that data?
One of the reasons kdb+tick was developed was in response to trading departments asking: isn't there a way to save the real-time data so it can be analyzed later? While it's true that small trading problems can be solved using a real-time data or in-memory database alone, big, strategic problems require the ability to save data and to compare real-time or in-memory and historical data on the fly, without losing speed anywhere along the line.
- 5. Can kdb+ data be compressed?
kdb+ includes intelligent compression through a choice of algorithms and a broad range of settings, driving down storage requirements and latency, and optimizing CPU usage. The user is able to set the desired compression levels, by specifying which data to compress and how heavily to compress it. For example, data that's a day old might not be compressed at all, data that is up to a week old could have only selected, customer-specified, fields compressed, while data that is older than three months might be fully compressed. kdb+ allows clients to use third-party algorithms, or use Kx's default, fast, proprietary compression algorithms.
- 6. What about connectivity to other software?
kdb+ has a very simple API, for easy connectivity to external graphical, reporting and legacy systems. There are interfaces to C/C++, Java, .Net, R, Matlab, Perl, Python and others. A WebSocket interface allows for a direct, bi-directional, full-duplex connection between a browser and an application, particularly useful for high-performance browser-based applications, such as visualizing real-time data. kdb+ also supports ODBC and JDBC to help with migrating data between kdb+ and traditional databases or applications like Excel.
- 7. Is it complicated to administer a kdb+ database?
Not at all — kdb+ is remarkably simple to manage, because native operating system routines are used for file management, including backup and restore.
- 8. Is a powerful server needed to run kdb+?
It depends on data volumes, but most customers begin with a 4- or 8-core system and grow from there. As the historical database builds up, multi-terabyte storage may be needed, but kdb+ is flexible — using local storage, SANs, or any combination.
- 9. What if a high-availabilty environment is needed?
Many Kx customers have implemented large, fully-redundant systems, including redundant tickerplants for kdb+tick. kdb+ supports failover, so there is no loss of data or performance. There is local logging as well as complete replication between data centers. IT departments can deploy kdb+ in distributed environments (e.g. clouds) and dynamically allocate system resources to meet real-time spikes, such as unusual peaks in market data. There is no need to over-invest in big hardware dedicated to kdb+, while there is extra capacity immediate available when needed.
- 1. Is kdb+ expensive?
No. The costs of kdb+ itself will typically be a small part of the total expenditure for an application.
When budgeting for kdb+, it is important to consider total cost of ownership, as purchasing kdb+ is usually part of a significant investment for an organization. The hardware and infrastructure is another component of this investment; kdb+ itself is typically used within a much larger framework that may impact most of a company's core business. Offsetting the costs are the simple setup and administration of kdb+, together with the high productivity of kdb+ developers.
- 2. What is the procedure for trying out kdb+?
Kx offers an onsite evaluation service to qualified prospects. In fact, in most cases, an evaluation is done to make sure that kdb+ is the right solution for the application.
If the application requirement is a good match to kdb+, Kx will help define evaluation objectives, provide a full software license, onsite installation and consulting, and email support. There is also the option of a proof-of-concept engagement with a Kx sales and service partner.
- 1. Who are Kx Systems?
Kx Systems is the developer of the high-performance database, kdb+, and its query language, q. Kx focuses solely on this database, and has been in business since 1993. Kx works closely with business partners who provide additional facilities.
- 2. Why use Kx instead of traditional databases?
Kx has become the major player in our target market of high-performance, high-volume databases. Kx is used instead of traditional databases because of its performance and flexibility.
In recent years, traditional databases have struggled to cope with massive increases in data volumes, and the old model of overnight reporting is no longer acceptable in real-time business. The business intelligence/OLAP/data warehousing structures that were built to make relational databases more efficient are also under increasing pressure to deliver faster analysis — and they cannot. Some in-memory databases and real-time data products deliver speed as long as the data is in memory, but they solve only a small part of the data volume and data analysis problem.
The limitations of traditional databases pushed many companies to write custom-built applications, typically storing data in flat files. These can work fine for simple queries hand-coded in C, but typically cannot support ad hoc queries, or the infrastructure needed for an enterprise database. Over time, these and other niche databases played an ever smaller role and have been replaced with kdb+.
- 3. Is Kx just a financial markets vendor?
The primary market in the early years was financial tick databases — this was the use area that first saw huge increases in data volumes.
But kdb+ is now seen in very different markets. For example, finance people use it for risk management, regulatory compliance and surveillance, and client services. IT departments have picked up the technology, and use it as a central data repository for many other applications, such as monitoring hardware performance and network throughput with performance measurements calculated in nanoseconds.
There is also increasing adoption in non-financial markets, such as telecoms, web services, and government agencies.
- 4. What support does Kx offer?
Priority technical support is considered to be an essential part of the product, and Kx offers rapid response worldwide. A large customer tells us that Kx is by far their most responsive vendor in terms of support. When issues are reported, there is a quick response from someone who knows the code, not a scripted response from an outsourced support center.
- 5. What about training, consulting, and other support?
There is an active email user forum where clients can post questions and discuss topics of interest to the community, and an active wiki for documentation, cookbooks, tutorials, and software addons.
Kx partner companies deliver consulting, training, and installation support globally, and also provide their own add-on dashboards, feedhandlers, and development and visualization tools.