Making Compute Abundant
A Compute.AI Article
The majority of the world’s SQL is already machine-generated. Enterprise AI is poised to accelerate AI-generated SQL by at least 1,000x. This will be unprecedented in its volume, concurrency, and complexity and today’s compute platforms are inadequate for this new future.
This article explains how Compute.AI’s technology is purpose built to solve this challenge.
TABLE OF CONTENTS
1. Evolution From DB2/Oracle to Snowflake & Databricks
2. Enter “The Lakehouse” aka “SQL directly on Parquet/Iceberg”
3. Splitting a Database into “Two Unequal Halves”
4. The Problem of Liberating Compute From Data Management
~ 4.1 AI Generated SQL – 1000x More Than Anything We Have Ever Seen
~ 4.2 Challenges in Migrating from a Cloud Data Warehouse to Lakehouse
~ 4.3 CPU Utilization: Memory Stalls, I/O Waits
~ 4.4 The In-Memory Dilemma: Overprovision Memory Get OOM-Killed
~ 4.5 Cost of Compute
~ 4.6 The Problem Was Hiding In Plain Sight
5. The Compute.AI Solution
~ 5.1 Compute.AI Plays at the Confluence of AI/ML & Relational Compute
~ 5.2 A Brand New Microkernel
~ 5.3 Supporting Multiple SQL Dialects: Spark & Presto SQL
~ 5.4 JIT Demand Paging
~ 5.5 Performance
6. Compute.AI Use Cases: AI-Driven Sources of SQL Generation
1. Evolution From DB2/Oracle to Snowflake & Databricks
Ever since the days of Edgar F. Codd’s Relational Model of Data for Large Shared Data Banks, the formation of SQL, and the first Relational Database Management System (RDBMS), technological advances in databases have been more evolutionary than revolutionary.
Without looking at the whole history, we can see two main database architectures: OLTP and OLAP; both are Transaction Processing systems but differ primarily by the speed of transactional consistency.
OLTP databases deal with ATM style bank transactions that are high in number and need low latency (sub second) resolution while OLAP databases are more throughput oriented, deal with large batches of data, and are the foundation of modern day analytics. An OLAP database that we now refer to as a Data Warehouse, and as the name suggests, it is suitable for large analytics workloads.
Aside from the ability to do Transaction Processing, OLTP and OLAP databases have another important thing in common–Data Management, which we recognize is why we named a relational database as RDBMS.
Relational Algebra, for which the lingua franca is SQL, is tuple space math and works on clean, structured data. This is why we have the ETL step before inducting data into database tables that stores data in a row & column tabular format with a schema to describe the columns. The description of the tables and their schema is referred to as DDL or Data Definition Language.
The next step is to actually load the data into the tables; this is called DML or Data Modification Language. Together, DDL and DML allow new data, including transactional data that requires Insert/Modify/Delete or IMD operations on the table, to be loaded into database tables.
DDL and DML form the two pillars of Data Management and have been the premise of all databases. These were steps that could not be subsumed if one were to query database tables using SQL.
Data Management is complex and hard to implement. We have seen and learned from the failure of Hadoop that claimed infinite scalability for databases when built upon Google’s Map-Reduce (M-R) framework. M-R was invented for doing FILTERs and AGGREGATEs–not JOINs & GROUP BYs–on naturally parallelizable data such as scrubbing the web’s pages and filtering keywords that would later be indexed not by M-R but by a traditional database engine like MySQL.
Hadoop claimed that M-R would be the future of databases and they were plain wrong. Fundamental relational operations such as JOIN, GROUP BY and SORT are not suitable for M-R. To implement the core relational operations using M-R, data needs to be shunted around on disks–a phase called shuffle–and forms a very expensive operation both in terms of time and cost.
Hadoop made one other major technical mistake and created HDFS, a poor implementation of Google’s GFS, that came with a high operational management overhead and was plagued with failures. However, the story that Hadoop told was big, compelling, and enough to get billions of dollars of VC money behind it, and create marketing hype that forced CIOs to not extend their traditional Data Warehouses licenses.
To add to customer pain, Hadoop’s HDFS ran on cloud object stores like AWS S3 (think of a filesystem upon another layer of a file/object store) and brought along not Hadoop’s failures but also lackluster performance as a result of two file systems thrashing each other.
With their data on cloud, and Oracle and Teradata licenses expiring, customers had nowhere to go and a huge problem on hand. Along came Snowflake. Their first target was addressing the Hadoop pain felt by customers.
Snowflake’s data management solution went through the various stages of eventual consistent architectures (both in their original database paper as well as eventually consistent object stores like AWS S3) to something that would work on the cloud.
Addressing the data management pain was monumental as data could now reliably be put into database tables and then queried using SQL, the very problem that Hadoop was trying to solve. The difference between Hadoop and Snowflake was that Snowflake was built by database engineers who understood relational databases, while Hadoop could not go beyond simple filters and aggregates.
Another important chapter in database evolution in the last 15 years has been Spark. It also capitalized on the failure of Hadoop and went with the theme of Hadoop in memory rather than on disk; that is, in-memory compute and shuffles rather than on disk based shuffles.
While Spark fixed many of Hadoop’s issues, it still had the legacy M-R architecture that was not suitable for relational work. However, with over a decade of work Spark has evolved into a powerful platform that stands on its own feet and tackles use cases that go well beyond SQL based data processing. For example, Spark is the default platform for AI/ML.
2. Enter “The Lakehouse” aka “SQL directly on Parquet/Iceberg”
Fast forward to the present and let’s pause upon one of the most significant, though still evolutionary, advances in Data Warehouses. Enter the Parquet file format. While about a decade old, what’s significant about Parquet is that it allows data to be stored as rows and columns using a schema, or in other words, as a database table rendered as a flat file.
Parquet is essentially the on-disk or a filesystem friendly version of a database table. Aside from its tabular structure, a Parquet file has inbuilt metadata that is much like the database statistics that are generated during the traditional DDL/DML phase of creating database tables. These stats are used by the SQL query engine (specifically the Cost Based Optimizer (CBO) of the query engine) for various optimizations.
Parquet is an open data format unlike database tables that are proprietary to each database implementation. We will soon see how this little difference is fundamental to making compute its own entity and also ubiquitous.
To get the full flavor of an open database table as captured by a Parquet file along with more comprehensive stats for CBO there is another missing piece that needs to be added. This is where Apache Iceberg comes into the picture. It provides sophisticated metadata that is created at the time of Parquet file creation and allows the full flavor of CBO to be incorporated when querying Parquet files.
Parquet+Iceberg can be queried by SQL in exactly the same fashion as a proprietary database table. As a result, Parquet+Iceberg makes database management into an open operation.
The biggest vendor lock-in for a data warehouse is its table format/structure. The secondary lock-in proprietary extensions to SQL and stored procedures (that are typically used as a data warehouse specific function for writing business logic). It should be noted that stored procedures have lost favor in the open source world given their stickiness.
Typically, Spark is used for creation of Parquet+Iceberg files. Once created, these files, or groups of files called External Tables (as they reside in a Data Lake or a Lakehouse and not inside a proprietary database silo) can be queried using SQL.
Evolution continues to win but it does not come without its drawbacks. While the Parquet/Iceberg step in database architectures freed us from siloed data and put our data into an open format, with an additional advantage of leveling the playing field query engines (think, may the best query engine win), the evolution came with a cost as described below.
3. Splitting a Database into “Two Unequal Halves”
Parquet/Iceberg is in fact DDL+DML and hence no different than loading data into a database that achieves DDL+DML. We must mention that Iceberg went further to provide support for micro batch updates, transactions, streaming data, and everything one would expect from a traditional Change Data Capture (CDC) process.
Even though Parquet, and later Iceberg, were evolutionary steps in making proprietary database tables and their metadata open, Compute.AI viewed the end result in a different light–What Parquet/Iceberg had achieved was to split a traditional database architecture into two parts. This is the single monumental step in the evolution of data warehouses where we have achieved true separation of data management (DDL+DML including transactions, snapshots, checkpoints, etc.) from querying data.
Once Parquet/Iceberg tables have been created, all that SQL needs to do is query them in a read-only manner. This allows the query engine to be simpler, smaller in footprint, easier to build, and also build robustly as a result. Transactions, for example, can be resolved by Iceberg. The transactional capabilities of Iceberg are somewhat coarse and should not be compared to an OLTP style workload. And they are likely slower than in-memory database architectures (Snowflake, Spark, Presto/Trino), thereby making their applications use case specific.
With Compute.AI’s extensions to Iceberg, microbatched transactional updates of 100 million transactions per minute is quite achievable. These extensions help us meet the requirements of most analytical workloads with full Iceberg based transaction capabilities.
A simpler SQL engine can be a boon for addressing some of the other pertinent issues that will be discussed in the rest of the blog below. Addressing these pain points has been the focus of Compute.AI. It begins by starting with certain assumptions and then targeting use cases that the incumbent in-memory architectures cannot handle.
While it is theoretically possible to take Spark, Presto/Trino and rip out features like transaction management to reduce it to a simple SQL engine that operates on read-only data, the practical reality is that doing so would be far too complex. It’s a bit like taking the skeletal and nervous subsystems out of a human being to create a simpler being.
4. The Problem of Liberating Compute From Data Management
With Parquet/Iceberg we now have an opportunity to query data (now in an open format) differently. We have an opportunity to shed the burden of doing transactions (as just one example) as a part of compute. Transactions, at least for analytical workloads, can be destaged into Parquet/Iceberg, and that allows us to build elegant and relatively simple query engines that are incredibly fast, scalable, and provide unlimited concurrency.
The simplicity contributes to making an engine built upon this “read-only” data paradigm (where the IMD operations are handled by the Parquet/Iceberg stage) smaller in footprint, fault resilient, and something that we can consider putting on every server, compute, and edge device without hesitation.
Compute can thus be made abundant and just another substrate, though a powerful one, in the stack of hardware, OS-es, hypervisors, and finally containers. We have arrived upon the opportunity of making compute ubiquitous and imprinted everywhere there is a need for compute–call this the Operating System for Compute.
4.1 AI Generated SQL – 1000x More Than Anything We Have Ever Seen
In creating an Operating System for Compute, Compute.AI did not have to rethink database architecture in the way one would imagine.
At first, it was about what can you do to simplify a relational query engine if DDL/DML, transactions, checkpoints, rollbacks, point in time recovery (also called time travel) were all handled as a pre-step. This first step resulted in significantly smarter, tighter and faster query processing that was powerful and extremely reliable to build in short order.
However, the vision to make compute abundant and infinitely concurrent and scalable to meet the 1000x greater demands of AI generated SQL were not going to be met by this alone. Compute, aside from its complexity, in all its database/warehousing renditions, was consuming a large amount of hardware/infrastructure.
Our love for in-memory database architectures allowed us rapid SQL engine prototyping and even beating these prototypes into shape to be monetized as products. But the bottom line remains…these architectures are very memory hungry.
4.2 Challenges in Migrating from a Cloud Data Warehouse to Lakehouse
Many customers we talk to find cloud data warehouses easy to use because of the hosted SaaS feature where only a single SQL JDBC endpoint is presented to the user to do all their work. All of that while requiring no devops expertise in-house. Also, cloud data warehouses like Snowflake are known to be reliable. These considerations are key in keeping customers locked in.
However, as data volumes increase and the amount of compute increases, the data warehouse costs start to play a significant role in business decision making. There is also the feeling of being locked into a proprietary silo, a brick walled garden.
Cost and vendor lock-in are the two primary motivations for businesses to seek an open Lakehouse solution where they can run Spark, Presto/Trino on Parquet/Iceberg files and external tables.
Unfortunately, the move from a cloud data warehouse to a Lakehouse is not without its challenges. Going to open source platforms means that the business must have devops and data engineers. The ease of use of a warehouse does not exist when working on open source platforms. Also, open source platforms are notorious for their Out-of-Memory failures; something that is unacceptable for production workloads.
Hence, even though it is technically possible for BI apps to be served directly from the Lakehouse, the practical reality is that a cloud data warehouse appeals to many businesses as a better solution.
4.3 CPU Utilization: Memory Stalls, I/O Waits
Now add to the above (the issue of memory hungry in-memory architectures), the stagnation of CPU and Memory clock frequencies for the last decade and a half.
Neither CPUs nor Memory have gotten much faster in that time period. While hardware fab allows us to dope more transistors on a silicon wafer and produce high yields, none of that addresses the worsening core to memory ratios as a result of more cores now accessing the clogged memory buses.
When you have more transistors doped onto a processor, one can put them to use in one of these two primary ways: 1. Turn them into more cores, 2. Turn them into larger processor caches; or some combination thereof.
With the spotlight on compute, we have voted in favor of more cores for the most part. As a separate observation, there is no caching architecture that can work for transient or ephemeral data which are very typical of analytics workloads. Caches are useful if the same data is accessed over and over.
As a result, we are at the point where we have a fairly large number of cores that access a finite number of memory chips or DRAMs. While DRAMs tend to get much more expensive as you increase memory density (to fit into the finite number of server slots), cloud providers have done well to amortize the cost of memory and charge linearly as you use more memory (to satisfy the demands of in-memory software architectures).
Given that the dominant cost of cloud server infrastructure is actually memory, there is a great need to go elastic to meet the demands of in-memory compute. Hence, we are left with larger clusters with more cores…cores that are stalled getting to the data in memory (memory stalls), or waiting for data to appear from other nodes in the cluster, or from data being demand paged (now referred to as spill-to-disk) which is referred to as an I/O wait. These are the two main reasons for the low CPU utilization.
Larger data typically means more nodes to accommodate the demands of larger memory intensive workloads, and hence an exasperation of the memory stalls and I/O waits, all of which result in a net net of a low CPU utilization. 30% CPU utilization across elastic clusters (that means 70% idle CPU) is fairly common, and even that number may be a bit on the optimistic side.
Keep in mind that we are paying the rent for our cloud infra by the second or minute. If 7 out of 10 cores are idle, that is like leaving 70 cents per dollar on the table.
4.4 The In-Memory Dilemma: Overprovision Memory or Get OOM-Killed
The other major issue that we see is the overprovisioning of memory when running workloads. The challenge here is that spill-to-disk algorithms (when you are low on memory you write data to disk and then read it back while expending other data that you may not need) are slow.
Remember the days of the Linux kernel paging when it was time to take a coffee break? In other words, paging is so slow that it makes writing to disk, or even SSD, and reading from it to virtualize physical memory an impractical algorithm.
Many open source architectures have done a good job of spill-to-disk since provisioning memory for the worst case requirements is impractical, but these algorithms have not been incorporated for all relational operations.
Understand that the basic set of relational operations is: FILTER, AGGREGATE, SORT, JOIN, and soon followed by, PROJECT, GROUP BY, MERGE, UNION, and WINDOW. The level of difficulty in making each of these algorithms survive in a minimal amount of memory and offer solid performance while spilling-to-disk is extraordinary. Hence, we see spill-to-disk being used in some but not all of the algorithms.
So what do we do when we need to execute an algorithm where spill-to-disk is not implemented? The in-memory databases simply fail the job. It is as easy as that. It is called an Out-of-Memory failure; the exception is thrown by the software and there is no recourse to running this except without a larger amount of memory or running the job with smaller amount of data. Both solutions come with severe tradeoffs.
So the industry came up with a solution. This was to overprovision memory. If one can provide adequate memory (this can be determined by trial and error or empirically in most cases, with the exception of unpredictable cases where dynamic compressions and rarefactions within the data can cause memory explosions that cannot be predicted) for a particular job, then we have lessened the chances of an Out-of-Memory failure. BTW, this failure has a term used by engineers…it is called an OOM-kill, and stands for the Out-of-Memory kill that results in the job being killed.
When we overprovision memory for the worst case we are buying more cloud infra for the worst case, and not the common case to protect against an OOM-kill occurring in our production workloads.
Buying more cloud infra mainly translates to buying more memory; but remember that more memory is dished out along with more cores on all major clouds. And the more the cores, the less the CPU utilization for reasons we now understand.
This is the problem when we look deep under as we start to investigate the cost of compute issue. Of course, we are not talking about software licensing costs here as that is another factor that is often a multiple over the actual infra cost that a database vendor will change. What we have done is take a look into the fundamentals of the cost structure.
There are other factors such as skew in data and the overhead of shuffling data/indexes resulting from skew. These issues require a blog on their own and best only mentioned as factors contributing to poor CPU utilization.
4.5 Cost of Compute
One can argue that the problem did not require to be solved up until now because few customers were highlighting the cost of compute as their pain point. Now, the situation is different.
The cost of compute is one of the top concerns when it comes to data processing. If we have 30% or less CPU utilization for today’s SQL, how on Earth are we going to meet the demands of compute that will increase by at least 1000x if not 10,000x due to AI and machine generated sources of SQL?
This is the million, or a billion, dollar question that we are faced with today, both as customers or entrepreneurs looking to tackle the hardest pain points in the industry.
Coming back to the stagnated CPU and Memory frequencies…between these hardware limits that we have hit, the growing need to process massive amounts of ephemeral data (that you see once and you have to process it fast to glean those deep meaningful insights), and in-memory architectures, this whole space is begging for ground up innovation.
Some of this is, “Hardware catches up with software and then software catches up with hardware” cycle that we see in industry (it’s been captured quite well by an a16z video series) and that is where we are at.
4.6 The Problem Was Hiding In Plain Sight
For Compute.AI, the problem was hiding in plain sight and that is what we set out to do when we founded our company.
However, addressing the issue of better infrastructure and memory utilization was not trivial. One had to dive a few layers deeper to understand the symptoms that businesses characterized as their pain and cried, “Help, our compute costs are too high. How can we reduce them?”
None of the businesses or customers we talked to or worked with said that their memory utilization was high, or that their CPUs were barely 30% utilized (a generous number as you go to larger elastic clusters).
Was it that our love for in-memory database architectures was hurting the worsening core to memory rations of modern day processors? At least on this one topic the answer is no; or certainly not in a direct way.
To refresh our memories, the core to memory ratio issue is why we see memory stalls. Is there anything that software can do to prevent these memory stalls?
To better address this we need to go one level deeper. What is the root cause of a memory stall? The answer is that you basically have many cores going after data in memory at the same time and have run out of memory bandwidth.
Think of an entry ramp into a freeway where you are stuck in your car during peak hours waiting on the traffic light waiting so you can jump on the busy freeway. This is analogous to what happens when a core stalls on memory.
One reason this happens is that there is just too much data that is moving rapidly through memory (ephemeral data that needs to be processed), and the other issue is actually non-trivial. It has to do with how processors work at a low level. CPU cores work on instruction and data pipelines. Synchronization across these cores is expensive and comes with a cost…a CPU stall.
There is no easy way to get rid of the various reasons that can be attributed to processor pipeline stalls. In fact, advances in software, especially development language software, have made it fairly easy to write complex multi-threaded code, and come with low level constructs or primitives that contribute to processor pipeline stalls and in general aggravate the memory stall problem. But, they say, in software nothing is without its tradeoffs.
Maybe we can illustrate this with an example. Be warned that there are some sacred cows here and in no way is this blog professing that one must or must not do something.
Let’s take the C++ STL library as an example. When writing a low-level code that runs naked on silicon, one needs to exercise extreme caution on what aspects of STL are being used. Some of them, as we discussed earlier, allow rapid prototyping but come with a cost, and others have highly optimized algorithms that actually would be hard to write for mere mortal programmers. That seems a bit harsh to say but that is the exact nature of what we are talking about.
So rather than get into a religious war over what aspects of a language or its supporting libraries one can use to address the memory stall issue and the worsening core to memory ratios, it may make better sense to discuss how our engineers at Compute.AI addressed the issue.
Keep in mind that, although we have a small team, it’s comprised of highly talented and experienced engineers, and hence, what is said in the below sections may not translate to something that is practical for most companies to implement. This team has years of experience developing low-level software that is highly machine optimized.
5. The Compute.AI Solution
5.1 Compute.AI Plays at the Confluence of AI/ML & Relational Compute
The current hype surrounding GenAI and LLMs might have done some disservice to the larger realm of AI/ML that includes reasoning based solutions. Separately, there are a large number of models (outside of GenAI) that have been built in the last decade to solve domain specific problems (credit card fraud, healthcare, IT, Telecom, etc.). These models have primarily used Spark and its ecosystem for AI/ML. Databricks recently provided statistics that said 1 out of 3 models developed is now in production, an astounding number. However, with more and more use cases exploiting GPT-4 and going beyond text into training with multi-modal datasets, the number of use cases for GenAI continues to be on a rapid rise and puts into question the wide use of non-GPT models.
Many very large scale models use Python based Map-Reduce frameworks (for example, Apache Flume) for both training and operationalization. However, SQL engines like Google BigQuery, Spark, Presto/Trino, etc., support the invocation of AI models as SQL UDFs. SQL is both superior to Python in its ability to recruit a large number of CPU cores, running models on GPUs & TPUs, and feeding them with data from relational (structured) and vector (semi/unstructured) databases, while using Python interfaces. Over the next decade there will be some standardization on frameworks to train and deploy AI/ML models.
We are soon arriving upon the reality of real-world workloads that super recruit both CPUs and GPUs at a cloud scale to solve business problems that create great value. To just focus on AI/ML and tie that to GPU/TPU only processing would be shortsightedness on our part. For real world applications we will need to run AI models on many GPUs and provide them with data from the aforementioned data sources (relational and vector). With SQL as the fabric to tie this together, we will see model deployment and operationalization use CPUs and GPUs in unison. SQL is a potential platform that will have a role but in a way that is currently still evolving. Compte.AI’s focus is on making the compute highly efficient regardless of the frameworks that become popular.
5.2 A Brand New Microkernel
As a first step, we wrote a microkernel in C++ from scratch that was custom built to process Relational Algebra or SQL. We worked with very low level C++ constructs, shallow inheritance where aesthetics and manageability absolutely demanded it and we were happy to deal with the overheads of dynamic linkage that comes along with virtual inheritance.
We minimally used implementation inheritance which is a fancy way of saying that we minimized templates which are just macros and can cause code bloat and instruction cache and pipeline issues.
We got rid of mutexes in fast path algorithms and used logical or Lamport style clocks. This means that we embraced an ordering style of code rather than consistent test-and-set or compare-and-swap style resolutions that have a consensus of 1 or N respectively, but come with the side effects of memory stalls that we are after.
We “banned” the use of STL, not because STL is bad or anything, but to protect the code from an inadvertent use by a future engineer who would bring in a primitive like a shared lock in a fast path algorithm and cause issues that are hard to both figure out or fix in the future.
We avoided the use of posix threads and created a new style of context switching (patent pending) that we believe is one of the fastest ways to switch between light weight threads.
The end result was a tiny compute engine that is extremely efficient and addresses at least some of the key issues that affect memory stalls. The beauty of this code is that it has been signed, sealed and delivered. That is, this code is frozen, we have not had to touch it in the last 18 months, and expect little in the form of future changes and maintenance.
Aside from the low level details on how the microkernel was designed, as we just discussed, it came with a set of capabilities that are very specific to databases.
It published an API that made it very easy to build all the relational operators needed to support SQL. This helped with rapid scaling in development that is needed to support SQL grammar, builtins, conformance with open source interfaces, etc.
All said and done, we built a brand new database platform written from scratch in less than 2 years.
5.3 Supporting Multiple SQL Dialects: Spark & Presto SQL
Once we had built the microkernel for relational compute, all we had to do was allow the operators to be executed when issued SQL.
To achieve this we took the logical SQL plan, basically a DAG or a Directed Acyclic Graph, generated by any SQL parser and passed it to the operators. This is almost like a JVM where a set of byte codes are executed by the Java Virtual Machine.
Having such an architecture allows us to execute a SQL plan that is generated by any open dialect of SQL. It is for this reason that we are able to support Spark and Presto SQL.
When talking about these two dialects of SQL people often associate Spark SQL as having to process batch jobs, and Presto or Trino for ad hoc jobs. That is not the purpose of our support for these two dialects.
We chose the two most prevalent dialects of open SQL that the world uses to talk to a database, and we went to support them for whatever job flung at them: batch or ad hoc.
Our differentiation is not in batch or ad hoc but in our ability to use CPU and Memory in the most optimal manner in industry and enable a new set of use cases that the world of AI/ML desires. This is what we should dive into next.
What is the state of the union when it comes to analytics and where are we headed?
Let’s start by saying that compute efficiency today is poor at 30% CPU utilization and infrastructure costs primarily resulting from overprovisioning of memory are the dominant issues behind the pain voiced by customers…the pain of high compute costs that burn a hole in their pockets and actually put a question mark on the business viability of tackling certain problems in analytics, both in terms of SLAs and cost.
5.4 JIT Demand Paging
Keeping this fast arriving future in mind we quickly see how the lack of compute efficiency and costs that can barely keep up with today’s workloads, and our compute engines require a fundamental redesign to meet the needs of tomorrow.
We have already discussed how Compute.AI tackled the problem of memory stalls. The other major problem when you try to process large amounts of data with a small amount of memory is that of efficient spill-to-disk algorithms.
It is well understood in the annals of computer science that canonical demand paging algorithms have been pushed about as far as they could have, and while that has helped, none of that moves the bottom line. Spill-to-disk continues to be impractically slow for product workloads, give or take a little. So what can we do here?
Without getting too much into our IP we decided to try a completely new approach. We, in fact, leveraged AI/ML, and specifically the pattern recognition algorithms of AI to model our demand paging. The result of this is a Just In Time or JIT paging that is able to fetch data from lower tiers of storage into memory and evict it from memory at subsecond boundaries.
The ability to do JIT paging has allowed Compute.AI to go as much as 100-300x memory overcommitment with no visible loss in performance at 10x memory overcommit.
With such massive levels of overcommitment one does not have to provision memory for the worst case. All jobs are guaranteed to complete and the dreaded OOM-kill is itself now nuked. All of this while achieving unlimited concurrency.
Compute.AI focuses on leveraging the vertical memory hierarchy that is available in the cloud. We already have multiple tiers of storage starting from memory to NVMe based SSDs, networked SSD based files systems, and slower object storage.
If we can move data across these tiers and get it in front of the cores as they touch data without the cores having to stall or software threads of execution being idle as a result of disk or network I/Os to complete, we would have virtually extended memory to provide overcommitment without an acceptable loss in performance.
The tighter vertical memory footprint on a single server is conducive to run not just workloads faster but demonstrate 2-10x performance increase as compute complexity and/or concurrency increases.
Initial benchmarks show ~5x infrastructure reduction and up to 10x higher performance for complex SQL. That is, there is a reasonable ability to achieve about 50x when it comes to price/performance.
6. Compute.AI Use Cases: AI-Driven Sources of SQL Generation
With the advent of AI/ML the amount of machine generated SQL is guaranteed to increase. Even today, BI apps like Tableau, PowerBI and Looker generate at least 10x more SQL than human written SQL. To meet the needs of use cases such as AI driven BI, autonomous semantic layer generation, and low-code and no-code apps, we will need to have the ability to process 1000 to 10,000x more SQL than today. And all of this at a very high level of concurrency. Think, unlimited concurrency.
To summarize, Compute.AI leverages many of the evolutionary advances in databases.
The separation of DDL+DML from DQL as led by Parquet/Iceberg is a significant moment where we have an opportunity to view databases in a new light and create technologies that solve not just today’s problems while playing in a rich ecosystem of data processing vendors, but future-proof our customers’ investment in our products as they step into the fast arriving future of AI generated SQL.
This future would be enabled by our strongly differentiated technology where we create immense value for businesses and make the world of compute a better place with compute made abundant just as oxygen is on the planet.