Data mining and business intelligence

xiaoxiao2021-03-06  25

Data mining and business intelligence

Frank1982

Abstract: In today's more competitive market environments, you must have hope to find huge profits from the business data of the voyage sea and other related data. Only those enterprises that use advanced information technology to successfully collect, analyze, understand information, and decision-making in information, is the winner of the market. Therefore, more and more managers have begun to discover problems in business operations with business intelligence technology to find favorable solutions.

Key words: data mining; business intelligence.

1. Data mining profile

1.1 introduction

In the past ten years, with the development of science and technology, economic and society have made great progress. At the same time, a large amount of data has produced a lot of data in various fields, such as humanity to space exploration, bank daily huge transaction data. . Obviously, in this data, how to deal with these data is beneficial, and people have explored. The rapid development of computer technology makes it possible to handle data, which promotes the great development of database technology, but in the face of continuous increase in tidal data, people no longer satisfy the database query function, and propose deep problems: energy Information or knowledge cannot be extracted from data or knowledge as decision-making services. It has been in powerless on database technology. Similarly, traditional statistics technology also face great challenges. This urgently needs new methods to handle these massive data. Thus, in conjunction with statistics, databases, machine learning and other technologies, people have proposed data mining to solve this problem.

1.2 What is data mining?

Although the history of data mining is short, since the 1990s, its development is very fast, and it is a comprehensive product of multidisciplinary, and there is no complete definition, and people have proposed a variety of data mining definitions. E.g:

SAS Institute (1997): "The advanced method of data exploration and establishing a correlation model is performed on a large number of related data.

BHAVANI (1999): "Use pattern identification technology, statistics and math technology, and find meaningful new relationships, patterns, and trends in data".

Hand et al (2000): "Data mining is the process of finding meaningful and valuable information in large databases."

Specifically, data mining, also known as Knowledge Discovery in Database, KDD, refers to an implicit, unknown, non-ordinary and potential, unknown, unknown, unknown, unknown from a large database or data warehouse. Applying value information or model, it is a new field of database research, combining database, artificial intelligence, machine learning, statistics, etc. The theoretical and technologies.

1.3 Main functions of data mining

Data mining has integrated various disciplines, there are many functions, the current main functions are as follows:

Ø Category: Follow the properties, features of the analytical object, to establish a different group class to describe things. For example, the banking department divides customers into different categories according to the previous data, and now you can distinguish between newly applied loans to take the corresponding loan solution.

Ø Cluster: Identify analysis of the intrinsic rules, divide the object into several classes according to these rules. For example, the applicant is divided into highly risk applicants, moderate risk applicants, and low risk applicants.

Ø Relationship rules and sequence mode discovery: Association is such a connection that will occur during some things. For example: people who purchase beer every day may also purchase cigarettes, how much specific gravity can be described by associated support and credibility. Unlike associations, the sequence is a longitudinal connection. For example: Today, the bank adjusts interest rate and changes in the stock market tomorrow. Ø Prediction: Grasping the law of analytical object development and foresee the future trend. For example: judgment on future economic development.

Ø Detection of deviation: The description of the minority, extreme special examples of the analysis object, reveals the internal reasons. For example, there are 500 fraud in the bank's 1 million transactions. In order to stabilize, the bank will discover the intrinsic factors of this 500 cases, reduce the risk of future operations.

It should be noted that the functions of data mining are not independently existing, and they are connected to each other in data mining.

1.4 Data Mining Method

As an emerging technique for processing data, data mining has many new features. First, data mining faces a massive data, which is also the cause of data mining. Second, the data may be incomplete, noise, random, have complex data structure, and dimension. Finally, data mining is a cross-discipline, using technological technologies such as statistics, computers, mathematics. The following is a common and widely used algorithm and model:

Ø Traditional statistical method: 1 Sampling technology: We face a lot of data, analyzing all data is impossible, there is no need to make a reasonable sampling under the guidance of theory. 2 multi - statistical analysis: factor analysis, cluster analysis, etc. 3 statistical prediction method, such as regression analysis, time series analysis, etc.

Ø Visualization Technology: Use the graphics and the like to discuss the data characteristics intuitively, such as histogram, etc., which use many of the description of the statistics. A puzzle facing visualization is visualization of high dimensional data.

Ø Decision tree: Use a series of rules to divide, build a tree map, which can be used to classify and predict. Common algorithms include Cart, Chaid, ID3, C4.5, C5.0, and the like.

Ø Neural network: The anthropogenic function of the simulator, the input layer, hiding layer, output layer, etc., adjusting, calculating the data, and finally results in the result, for classification and regression.

Ø Genetic algorithm: an optimization technique based on natural evolutionary theory, simulating gene combination, mutation, and choice.

Ø Association rule mining algorithm: Association rules are rules that describe the relationship between data, form "A1∧a2∧ ... AN → B1∧B2∧ ... BN". It is generally divided into two steps: 1 to find a big data item set. 2 Generate a correlation rule with a big data item set.

In addition to the above common methods, there is a rough set method, a fuzzy set method, a Bayesian Belief NetRDS, a neighboring algorithm (KNN).

1.5 Implementation of data mining

As we discussed the definitions, methods and tools of data mining, now the key issues are implemented, and the general steps are as follows:

Question understanding and proposing -> Data Preparation -> Data Corporation -> Establishment Model -> Evaluation and Explanation

Ø Question understanding and presentation: The most basic in the start of data mining is to understand data and actual business problems, and ask questions on this basis, which is clearly defined for the target. Ø Data Preparation: Get the original data, and draw a certain number of subsets, establish a data mining library, one of which is if the original data warehouse meets the requirements of data mining, you can use the data warehouse as a data excavator.

Ø Data finishing: Since the data may be incomplete, noise, random, have a complex data structure, it is necessary to initially organize the data, clean incomplete data, and make preliminary description analysis, select and data mining Related variables, or transition variables.

Ø Establish model: Select the appropriate model based on the characteristics of the target and data of data mining.

Ø Evaluation and explanation: Evaluate the results of data mining, select the optimal model, make an evaluation, applied to practical problems, and explain the results with expertise.

The above steps are not completed once, and some steps may be procedure or all.

1.6 Application Status of Data Mining

Data Mining The problem to be processed is to find a valuable hidden event in a huge database, and analyze, obtain meaningful information, and summarize useful structures, as a basis for the company. Its application is very wide, as long as the industry has the database of analyzing value and demand, you can use the Mining tool to explore analysis. Common application cases occur in retail, manufacturing, finance financial insurance, communications and medical services:

Ø Business found a certain relationship from customers to purchase goods, provide discount shopping vouchers, improve sales

Ø Insurance company establishes predictive models through data mining, distinguishing possible fraud, avoid moral risks, reducing costs, and improving profits

Ø In the manufacturing industry, large amounts of data in the production and testing of semiconductors must analyze these data, identify the problems, improve quality

Ø The role of e-commerce is getting bigger and larger, and the website can be used to analyze the website, identify the behavior mode of the user, retain customers, provide personalized service, optimize website design.

This article is to explain the application status and future prospects of data mining through data mining in business intelligence.

2. Business intelligence overview

2.1 Introduction

As early as the end of the 1990s, business intelligence technology was selected by a computer authoritative magazine as one of the most influential IT technologies in the next few years. Although the entire IT industry is not booming in recent years, the research and development and application related to business intelligence is in the ascendant, and hundreds of IT companies continue to flour into this emerging field, Bi applications have even become a new IT world. " Highlights. What is business intelligence technology? What technologies have supported such prospects business intelligence applications?

2.2 Introduction to Business Intelligence

Business intelligence technology is not basic technology or product technology, which is a data warehouse, online analysis and processing of related technologies such as OLAP (Online Analytical Processing) and data mining. An application technology is formed after commercial application.

The business intelligence system mainly implements the process of converting the original business data into enterprise decision information. Unlike general information systems, it has highlighted performance in various aspects of mass data, data analysis and information show.

Figure 1. Commercial intelligent system architecture

The business intelligence system mainly includes data pretreatment, establishing data warehouses, data analysis, and data show four major phases. Data pretreatment is the first step in integrated enterprise raw data, which includes data extraction, conversion, and loading three processes. The establishment of a data warehouse is the basis for processing massive data. Data analysis is the key to reflecting system intelligence, generally using two major technologies for online analysis and data mining. Online analysis processing not only performs data summary / aggregation, but also provides data analysis functions such as slice, cut, down, upper and rotation, and users can easily analyze massive data. The goal of data mining is to excavate the knowledge behind the data, establish an analysis model through methods such as association analysis, clustering, and classification, and predict the future development trend and the problems that will face. In the case of massive data and analysis means, the data shows the visualization of system analysis results. The data warehouse, OLAP, and data mining technology are generally considered to be three major components of business intelligence. 2.3 Main processes of business intelligence

Clear Demand -> Information Collection -> Data Sampling -> Clear Conversion -> Analysis Refining -> Information Archive -> Information Send -> Use Feedback.

Finally, decision makers feed back the use results by properly use business intelligence. By feedback, there can be potential problems, and it can also improve the quality of quality in the business intelligence process according to the case, express new demands.

Business intelligence is a technique that occurs on the basis of computer hardware and software, network, communication, and decision-making and other technologies. It needs to find the law from the source of data resources (database, data warehouse, web, etc.), which will mainly depend on data mining technology. Because data mining is to excavate implicit, unknown, might of interested, valuable knowledge and rules for decision-making. These rules contain a specific relationship between a set of objects in the database, revealing some useful information, providing a basis for business decisions, market planning, financial forecasting.

2.4 Business Intelligence Industry Application

Manufacturing: You can take more active actions in sales / marketing to attract customers, or by scanning data forecasting requirements, timely order and replenishment, through procurement / supplier analysis, real-time understanding of the cost difference between suppliers and The agent's situation, and optimizes scheduling, distribution, and transportation process to achieve low inventory level.

Insurance industry: According to historical data such as insured varieties, insured, insurance, the insurance company is reasonably set up the reserve amount, analyzes the standards of compensation; analyzes the needs of insurance customers, consumption characteristics; risk analysis and profit or loss; according to customer Psychological services, etc.

Bank, Financial and Securities Industry: Analyze the current and long-term overall benefits of customers, and can lay the foundation for high profit sales and banking according to the cost and sales data of one year or longer, and establish the basis for high profit sales and banking; Credit distribution model; provide an early warning to avoid customer credit crisis, providing credit management methods when providing credit situations or deterioration, providing a more accurate combined business assessment. Predict the impact of credit policy changes to reduce credit losses; improve customer loyalty.

Telecommunications industry: used for customer description and positioning and demand forecasting.

3. Use data mining to improve business intelligence

3.1 Overview

The ability to have integrated data and quick and accurate analysis of data, thereby making better business decisions, can bring competitive advantages for companies. How to discover and use this advantage is the topic of business intelligence studied.

Any good business decisions need truth and digital support. The correctness of a decision depends on the facts and numbers of the facts used. As competition increases, it is necessary to make decisions in a shorter time. Therefore, in this period of time, it is becoming more and more related to obtaining information as much as possible. At the same time, in order to make decisions have a good correctness, it is getting bigger and bigger. In order to span larger decision branches, there is a need for longer time. Therefore, there is a need to have an automatic data analysis tool to help reduce the time required to accurately analyze a lot of data. Data mining is a very useful technology. Below I will explain how data mining improves business intelligence through a case.

3.2 Case Analysis

In case studies, a sports supplies called "sports boutique" have a sales office in 7 countries. The company headquarters in Sydney.

3.2.1 Data Analysis

The development of data analysis techniques can be divided into report queries, online analysis processing (OLAP) and data mining 3 phases. In this case, three phases of data analysis are shown in Figure 2.

Figure 2, three stages of data analysis

Different sales locations have established a solution to manage sales information within its single region. In order to increase sales, the sales vice president decided to promote the promotion through the award, the most sales of the most sales, the most sales of the most sales units. This vice president requested the CEO (CIO) to write such 2 reports. For CIO, this looks very simple, but there are many work to do before generating these reports. The following questions require special in short time: Sales Data Store different types of databases in different regions, data is different in different regions, and the currency units of the country are in different regions.

First, all data must be concentrated to the headquarters. Put all the data into a place and is easier to perform a query on the local data set. This location is referred to as a data warehouse for all relevant information. If the same concept is used to a single department in the company, the information libraries of these sectors are called a data market.

If the data warehouse tool is not used, the CIO may spend months to complete the above tasks. Conversely, if you use data warehouse tools like IBM Visual Warehouse V3.1, these tasks can be automatically completed quickly.

In Visual Warehouse, how to access, extract, process, and abundant data is done with Business Views. Business views is like a template. Once defined, it can be used to repeat the above steps uniformly. Therefore, the business view "According to the regional sales" is, where to extract data, what currency exchange table is used to convert currency, and what total data is calculated in advance. "Sales according to the product" view is also also defined.

Once a data warehouse is established, any front-end tool, such as Lotus Approach or Microsoft Access, can be used to view the actual report.

3.2.2 Online Analysis Processing

It is known from the analysis of the US Seattle as the highest sales, and the mountain bike is the best-selling product. The vice president responsible for selling helmet products saw the above report, he decided to see if the newly built data warehouse can provide more information to help him increase sales.

When the vice president of selling helmet products, when talking to CIO, CIO recommends special analysis, view data in different ways to display information you did before. This is also called online analysis processing (OLAP) or multi-dimensional analysis (MDA). In this program, two primary multidimensional analysis techniques are used. The first type of plumb down, the second type of share block (SLICE / DICE). CIO Help Vice President uses the Lotus Approach front-end MDA tool to query data of the data warehouse.

The following multi-dimensional data analysis involves 5 aspects, namely, products, sales, quantities, regions and time. All viewed data is in January. Vice President, responsible for selling helmet products, proposed the following questions:

1. What are the helmets are best in January?

2. In January, which country's helmet sales are in the best-selling area?

3. Which city's helmet is the highest in the leading country?

To answer these questions, CIOs must use Lotus Approach to go deep into the region (Location), with insertion method to view more detailed data in a specific data dimension. Through Lotus Approach, CIO has established a matrix with a cross-tag function, the Y-axis is a regional column, the X-axis is a sales bar.

The histogram of the analysis is the best selling helmet in Europe; in January, Germany is the best-selling area in European helmets; Hannover in Germany is the best-selling area in helmets.

The vice president of selling helmet products is now known from the previous report, and the mountain bike is the best-selling product in Seattle. Therefore, he wants to make the Sales of Seattle and Hannover's 2 cities and helmets, but compared to the sales amount (Dollars column) is wrong, because the price of 2 products is different. Therefore, he wants to compare the sales quantity of mountain bikes and helmets.

The vice president found that although the mountain bike sold the most in Seattle, the sales of helmets did not satisfactorily. The proportion of helmets and mountain buses is approximately 1: 5 (360: 1804). But he also noted that this ratio in Hannover is almost 1: 1 (445: 436). At this time, he remembered, in the shops of the shops in Seattle, the roller and helmets were placed as close as in Hannover. He decided to put these two products together in Seattle and view the sales situation of helmets and mountain bikes every day.

The technique of viewing different column data is multi-dimensional analysis. The type of analysis is performed is online analysis. Data for online analysis processing can be stored in a multi-dimensional database (MDD) or relational database (RDBMS). When the data stored in the multi-dimensional database is paired online, it is called a multi-dimensional online analysis process; when the data stored in the relational database is used online analysis, it is called the relational online analysis processing; When the data stored in the two databases is used online analysis, it is called a mixed online analysis process.

Specifically to this case, 5 data dimensions are used, they are time, turnover, quantity, products and regions.

3.2.3 Data Mining

In front, we can use different ways to process and analyze data. We can find the answer to the specific question, but it is only. When viewing data with a special method, we can find that the number of solders and mountain bikes do not interconnect. There is indeed association in the data warehouse, but there is no insertion and share twitch technology, we can't find it. Doing so consumes a lot of time.

Data mining will systematically solve these problems. It not only allows users to determine assumptions, but also allow users to find new information without doing corresponding manual labor, as mentioned above. IBM's Intelligent Miner for Data and Intelligent Miner for Text are a data mining tool. The former is used to search for data, such as transaction data in the company; the latter is used to search for text data, such as in the library. In this case, INTELLIGENT MINER for DATA is used.

The Intelligent Miner contains six main algorithms, namely associated, sequential models, prediction modes, classification, cluster, and deviation recognition.

The company's CEO is now worried whether they can make the right issue. The vice president happened to encounter such a fact that the sales of mountain bikes and helmets were not 1: 1. What other similar questions still have? Two such issues that CEOs may think of as follows. 1. What is the most likely merchandise for purchasing a mountain bike?

2. How many times is the customer who bought a gas bottle?

Intelligent Miner For Data makes the following answer: helmet, possibly 92%: gloves, possibilities are 62%: new bell, possibly 23%; speed meter, possibly 13%.

Intelligent Miner for Data may use the associated algorithm to get this answer. The association algorithm discovers the connection between products. Based on the answer, it will give the salesperson a directory, list the top three associated products suggested when selling a specific product. For example, if you sell mountain bikes, salesperson can recommend buying helmets, gloves and new bells.

According to the above answer, the following action can also be taken: education for sales partners on the rider security issue (this can improve the sales of reflective hoods, lamps and rearview mirrors); promotion other related products, such as beverage bottles , Speed ​​meter, personal audio; bundled sales activities; develop cross-sectoral promotions (such as leisure products and clothing used with mountain bikes and gloves); set a gloves in the Mountain Rail Showroom; selling the best accessories Record the success story for rewards and publish the most interesting attachment products per month.

Intelligent Miner For Data gives the 2nd question of CEO, the following answer: 12% of the customer who purchase the gas cylinder is only 1 time; 8% return 2 times and 7% return to supplement the number of items more than 2 times.

According to the above results, the company may make two decisions, one is to consider the performance of the replacement item, give up the inflatable business, or give 25% discount on the next 2nd inflatable discounts to improve the sales of inflatable business Performance. Companies can also take other actions, such as those who still stick to inflatable business, but the sales level should be improved to each gas bottle to inflate once a year; give existing and new cylinder owners to stimulate measures; in spring to gas cylinder Customer mail letter reminds them to return to inflatable; establish more convenient inflatable stations and every inflatable coupon in customer parking lot.

After 3 months, the company has the following results: the quarterly turnover increased by 34%, the income rose 32%; the average sales revenue of each mountain bike transaction increased by 29%; the mountain bike and the helmet were purchased into fashion (each The solder helmets of the sales location rose); sales of gloves increased by 15%; mountain bike attachments rose by 51%; sales institutions to bundle sales of successful cases are very common (reward for the most successful cases per month); gas cylinder inflatable Sales began to rise (until now, sales doubled last year).

Therefore, using data mining techniques to improve business intelligence, the final result is to increase sales, better profits.

4. Business intelligence prospects

According to market analysts, business intelligence has become the most important and potential area of ​​corporate information technology. Why is this so? Indeed, in today's economic environment, all large-scale companies in all markets require additional leverage to survive. The leverage usually comes from key decision makers that they can quickly access the business information required to assess market conditions.

This is no wonder that INTERNATIONAL DATA CORP. (IDC) This IT industry has the earliest global market intelligence and consulting company prophecy that the market size of commercial intelligence systems is doubled in 2006 and around $ 14 billion in the world. Other analysts, such as Meta Group, also believe that the focus of the database industry is moving from transaction to business analysis methods and data warehouse functions. references:

1. http://www.microsoft.com/

2. http://www.ibm.com/

3. http://www.ibcc.com.cn/

4. http://www.dmresearch/

5. "Data Mining Concept and Technology" Jiawei Han & Micheline Kamber, Fan Ming, Meng Xiaofeng and other translation

Because of the minimum support min_sup = 60%, there are 4 transactions, so the minimum support count is 2

The algorithm process is shown below:

转载请注明原文地址:https://www.9cbs.com/read-65089.html

New Post(0)