Data science is front and center in technology trends today. Putting data science to work in specific uses cases is the true challenge. To gain the most benefit, the technology and the use case have to match, bringing the best of both worlds to understand and propose solutions in specific situations.
In that context, let us examine the combination of IIAS – the IBM Integrated Analytics System (or “Sailfish” as it is fondly called by those in the know) – and DB Lytix™ – a collection of in-database algorithms and recipes from Fuzzy Logix. Together, they can (a) accelerate or improve an existing analytics use case or (b) permit creation of new use cases that were previously thought intractable.
First – a little background on first generation, or legacy thinking on analytics: databases and warehouses were effective storage and querying tools, and analytics software sat “outside” of the EDW appliance, designed to do the mathematical heavy lifting using a proprietary programming paradigm. This thinking is at least three decades old and, to be fair, served us well. Data sets were small, and in minutes one could copy them over to the analytics appliance and perform analytics.
Around a decade ago, customers began to realize that as their data sets grew each year and they had to keep upgrading the analytics hardware just to keep up. Analytics is inherently iterative, so each time a different cut/slice of the data was needed, this movement took valuable time away from the actual analytics. Security in some industries became a challenge. There were tons of rules put in place about which tables/views could be seen by who and the minute a copy of the data was made and transported “out” of the EDW many of those rules went out the window.
One of the first appliances to truly challenge the notion of analytics “outside” of the EDW was the Netezza/PureData series of appliances. It still enjoys a loyal fan-base of users who understand the differentiation it brought to the table. It provided a platform for analytics to happen ‘in-database.” Fuzzy Logix – a pioneer of in-database analytics – leveraged the framework and coded a comprehensive library of analytics algorithms that exploited the massive parallelization available within the appliance. This was a successful seven year partnership. Today, IBM offers a new IBM data warehouse system (IIAS) and Fuzzy Logix offers an even better DB Lytix toolkit, that has grown in experience and performance. Together you get the enviable “IIAS + DB Lytix” combination.So, what are some of the use cases this combination has been able to solve commercially? Let us look at these across a couple of industries – Healthcare and Finance.
1. Early Detection of Chronic Illnesses
Each one of us knows someone (within a couple of degrees of separation) who is suffering from diabetes. One in every eleven people have it. As a country, we spent $322Bn in diabetes care alone, and this is just one such chronic illness. Fortunately, as for many chronic issues, early detection and intervention might defer the onset of the disease by 2, 5, or 10+ years. This results in an absolute win-win-win: better quality of life for the patient and family; much lower costs for the insurer; and lower insurance premiums for everyone. The analysis is complex: one has to go through a complex myriad of signals from various test scores to patient data, e.g. additional conditions/family history etc., to predict – with a high degree of confidence – if someone is a candidate for early screening. Thanks to in-database analytics, this process can churn through billions of records and execute 10-100x faster, allowing the customer to re-run this analysis in minutes and on a daily basis, versus the previous month to six-week cycle.
2. Scoring Provider Networks
Another problem every insurer has is to measure and control the quality of care within their provider network. It is a fairly difficult problem, because each “episode” of care involves multiple tests/diagnoses/doctor’s visits and so on, before one can even start to measure how the overall experience and outcome was for the patient. This has to be measured against other providers who are treating the same malady/condition. For a large US provider with 35 million plus insured lives, this translates into billions of claims records going back multiple years. Thanks to in-database analytics, where data doesn’t need to be shuffled around back and forth constantly, this sort of complex analytics can happen effectively and efficiently.
3. Adverse Drug Reaction
It is a sad but true statement that adverse drug reactions or ADR, shows up as the top 5 killers in the country, ahead of pulmonary disease, AIDS, and automobile accidents. There are over 2 million serious ADRs annually, and it is not surprising that more than 40% of people aged 65+ are now taking over five different prescription medications. It is a statistically non-trivial problem to figure out the permutations/combinations of drugs that lead to an ADR. Drug companies cannot test all cases pre-launch, but do what is necessary to ensure safety of the drug in isolation. One of our customers was severely limited in their ability to perform this type of analysis. Data had to be sampled, moved, analyzed, and returned to the warehouse. With Fuzzy Logixs’ in-database capability, the customer was able to analyze a much larger problem space, and with a much larger data set in a highly efficient manner.
4. Equity Analysis
Analysts are typically interested in fast moving computations on an intra-day basis. These computations might be things such as volume weighted average price, momentum (moving averages and Bollinger bands), liquidity analysis, and efficient frontiers. What makes in-database analytics relevant here is the scale and performance. The computations might not be relevant or actionable if the results arrive too late. This is another scenario where combining the financial library present in DB Lytix and IIAS can offer a highly differentiated product.
5. Post Trade Risk Management
With volatility on the rise, post-trade risk management has taken on a new importance. The computational challenge here is that a typical firm may have a large number of diverse portfolios — each with a combination of equities/fixed-income/options and other instruments. Each one of these has to be simulated a large number of times in order to compute VaR (Value at Risk).
One of our customers has such a VaR process, which extracts all the positions data from a database; performs simulation and calculations outside the EDW; and stores the results back to the database. They use a Greek-based approach to calculate VaR, and on an average day there are 200,000 positions. This process takes 3 hours to complete. Once we implemented the process in-database, the process takes less than 5 minutes to execute. Now they can run this VaR process several times a day, rather than only once a day.
Above is just a small sampling of the dozens of interesting use cases from customers on which Fuzzy Logix and IBM worked together. In several of these cases, the outcome has enabled the customer to reengineer their business processes and gain a competitive market edge, or better comply with regulation.
The fun does not stop here though. There are situations where a 10-100x improvement in performance might still not be enough. In a subsequent article, we will look at how together we are addressing that white-space and accelerating analytics performance to 500-1000x using general purpose graphics processing units (GPUs). Stay tuned..