[an error occurred while processing this directive] [an error occurred while processing this directive]
[an error occurred while processing this directive]

















Tech Update
Top 5 data mining trends for 2002-03
A closer look at the trends
By Aaron Zornes
Meta Group
January 9, 2002
Provided byMETA Group
TalkBack!

More predictive models
Models are becoming more accurate as data mining tools create competition among the models and tools they employ (analogous to "genetic algorithms" of recent data mining tools). This is due to powerful statistical techniques, combined with newer capabilities such as association rules. For example, if a customer buys X and Y, the customer is also likely to buy Z. During 2002-03, such enhanced self-selecting intelligence will enable the combination and integration of several customer or transaction models into a single profile (a.k.a. universal customer view) to recommend the best actions.

Better data mining models
Historically, most data mining tools heavily relied on sampling. Contemporary data mining products utilize scalable tree-based classifiers to build models against very large data sets--even massive transaction detail data warehouses of 10TB to 20TB. Additionally, modern data mining tools are sufficiently powerful to accommodate greater numbers of attributes/dimensions. They are also often front-ended by intelligent segmentation capabilities that accelerate the derivation of actionable customer segments. Marketing and risk management executives are thus able to use more data to build more accurate predictive models. During 2003-04, current seemingly extreme data warehouses of 20TB to 50TB will be exceeded by select systems in the 100TB+ range.

[an error occurred while processing this directive]
More cost-effective modeling
Often, predictive models are produced by market scientists and statisticians using statistical software such as SAS, SPSS, or S+. To provide "actional" models and ROI, these data mining models must be integrated into front- and back-office systems. Such models and algorithms are invoked by Java or C++ programmers. Unfortunately, these two technical groups live in parallel universes (thinking differently and using different languages) and, as a result, there is too little coordination between development and deployment of these predictive algorithms. ROI is elusive, and lack of synchronization between these groups actually decreases end users' faith in systems accuracy and increases long-term costs for maintenance/integration. By 2003-04, data mining and algorithms will increasingly be externalized in business-rule language that is approachable and understandable to the corporate middle classes, much as decision trees currently help the layperson visualize complex problems better than neural nets.

Evolving data mining standards
The Data Mining Group is a vendor consortium that uses a common XML-derived format called the Predictive Model Markup Language (PMML) to describe common predictive models, such as logistic regression, decision trees, and neural nets. The goal is to create models that can be used by other data mining and business intelligence applications without recoding by hand. Another goal is for described PMML models to be safely infused, in real-time, into 24x7 operational systems. It should be noted that this is not possible with proposed alternatives for model interchange, such as Microsoft's COM or the Object Management Group's CORBA framework.

Certain vendors (IBM, NCR, Oracle, and SPSS) have been working to provide deployment capability for PMML models in which the adapters/middleware can be integrated once and new predictive models can be deployed later in real-time. The goal is to dramatically reduce costs to deploy new models and update old models. By 2004-05, a PMML-like capability will have arisen as either an industry PMML standard, or via one or more of the dominant vendors proliferating their own model interchange specification (likely candidates include SAS and SPSS).

Integration within RDBMS servers
To further enhance performance of running predictive models against massive data warehouses (as well as operational and real-time databases), certain data mining capabilities are going sub rosa and are being integrated into the relational database kernel. Examples include Darwin into Oracle, Intelligent Miner into DB2, and TeraMiner into Teradata relational DBMSes. Additionally, the native parallelism grafted into the RDBMS servers is, in turn, providing performance boosts for the data mining query optimizers. In the recent past, data mining vendors occasionally partnered with software-based parallel processing schemes, such as Torrent Systems or Belmont. However, by 2004-05, parallelism will be a native feature of all high-end data mining capabilities.

As data mining and predictive analytics become endemic for enterprises' very large databases, predictive modeling will concurrently emerge at the departmental and personal database level. By 2003-04, avatars (incarnations of an online alter ego or e-personality as a continuing entity) and wizards will provide "actional" insight into mainstream business activities via predictive modeling. Concurrently, the restrictions of batch predictive modeling will dissolve as more analytical applications take on the flavor of "continuous business analytics," and as analytics themselves become built-in capabilities in all personal, workgroup, and enterprise applications.

Business impact
During 2002-03, as data mining becomes more mainstream for predictive modeling, marketers and risk managers will be able to make tremendously enhanced and smarter decisions from their customer data.

Bottom line
Data mining and predictive analytics provide enterprises the most detailed breakdown possible of customer data warehouses--which should enable better customer grouping into a meaningful variety of actionable segments.

Top 5 data mining trends for 2002-03
By Aaron Zornes
First published by META Group on Dec. 13, 2001

 Previous page |   1 2 3 
Next page 

 Newsletters
Tech Update Today
eBusiness Update
Tech Update Weekly
All newsletters
FAQ
Manage my newsletters


[an error occurred while processing this directive]

[an error occurred while processing this directive]

[an error occurred while processing this directive]



[an error occurred while processing this directive]
[an error occurred while processing this directive]

1. Top 5 data mining trends for 2002-03
2. A closer look at the trends
3. A sampling of data mining products

ARTICLES
 IBM database gets more analytical

 This content has been removed.

 This content has been removed.

 Tools will speed data mining

 The new reality of customer retention

 Research: Predicting customer behavior: The role of customer analytics in anticipating demand

 Research & White Papers: Data mining






[an error occurred while processing this directive] [an error occurred while processing this directive]