The current market data architectures are based on a theory and approach that is no longer valid. Current systems were conceptualized and implemented when message rates were orders of magnitude less than they are today. The underlying architectural premise was that market data was moved to the applications and/or the analytics that required it. This approach tacitly assumed that the increase in market data rates would approximate Moore's law, which if true would have meant that the increase in processing power and network bandwidth would have supported the architecture. The facts are that market data rates are doubling every year, while Moore's law is delivering processing capability that doubles every 18 months. On a compounded basis, market data rates over a five-year period between 2003 and 2008 have grown 32X while processing power has only grown 8X. So the conclusion is not debatable. The architectures must change.
There are two major considerations for the next-generation architecture. One is the sheer mass of data and the other is the speed at which that data can be analyzed.
Mass of data
With certainty we can project daily market data volumes of a terabyte per day in the near future and then growing well beyond that. Applications will continue to want to analyze data to figure out what has happened in the past, whether that was a second, a minute, a day, week, month, or a year ago. It is not possible to move the sheer mass of data to the application for analysis. The analytics need to move to the data.
Speed of processing - programming models
Moore's law has not delivered the processing speed increases in a single processor that are required to support the explosion of market data volumes. To deal with this failure, the processing complex has evolved to include numerous single processors that can operate in unison. Multiprocessor servers are commonplace and now multi-core processors are becoming the standard in high-performance computing. The major problem is that the vast majority of software has been written using compilers and languages that were designed when the available hardware was single processor and the predominant software architecture was appropriately single-threaded. To take full advantage of multiple cores, software needs to be written from the ground up to be multi-threaded. This is very difficult because the languages and compilers are basically optimized for single-threaded use. The software needs to be multi-threaded.
Speed of processing - hybrid hardware/software systems
The speed at which general purpose processors and software written for those platforms can perform certain operations is orders of magnitude slower than special purpose hardware tightly integrated with application software. In order to deal with the explosion of market data, software-only solutions that run on general purpose processors will not work. Field Programmable Gate Arrays (FPGAs) are one class of specialized hardware being used to perform tasks like compression, message parsing, and other jobs significantly faster than is possible with software-only solutions.
Next topic - gravitational pull of data sucking in analytics