Data Mining

Reflection on Data Mining experience, and the advantages and disadvantages of data mining.

Well, Basically, we have not done the workshop for the data mining yet. So currently don’t have any reflection regarding the hand-on experience. Will blog it after the workshop.

Here is my finding about data mining, pros and cons.

Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns using tools such as classification, association rule mining, clustering, etc..

If a clothing store records the purchases of customers, a data mining system could identify those customers who favour silk shirts over cotton ones.

Source: http://en.wikipedia.org/wiki/Data_mining

Companies are now utilized data mining techniques to exam their database looking for trends, relationships, and outcomes to enhance their overall operations and discover new patterns that may allow them to better serve their customers.

Data mining is a term used to describe the process of discovering patterns (Knowledge Discovery Databases) and trends in large data sets in order to find useful decision-making information.

How data mining works

Data —-> Target Data —–> Preprocessed Data —–> Transformed Data —–> Patterns —–> Knowledge – – > Make decision

Data mining is also called knowledge discovery from database.

Before a data set can be mined, it first has to be “cleaned”. This cleaning process removes errors, ensures consistency and takes missing values into account. Next, computer algorithms are used to “mine” the clean data looking for unusual patterns. Finally, the patterns are interpreted to produce new knowledge.

Example: Records, age, sex, marital status, occupation, no. of children. Insurance company can use to identify customer who took out a particular kind of insurance policy. Rules – a good candidate for such policy. These rules are then used to identify such customers on the remainder of the database. Next, another algorithm is used to sort the database into cluster or groups of people with similar attributes, with the hope that these might reveal interesting and unusual patterns.

When the pattern reveal, it can communicate to their marketing dept – sent out the letter – offer/suggest certain insurance policy to their customers.

Marking/Retailing
Data mining can aid direct marketers by providing them with useful and accurate trends about their customer’s purchasing behavior.

Based on these trends, retailers are able to provide more attention to their customers. For example the case in Wal-Mart. The pattern actually showed that male who buy the diapers also buy beers at the same time. Based on this information, the retailers can make better decision on the arrangement of their products in the shelves.
Final outcome, the manager began to stack the beers next to the diaper shelf. This will enhance the customer shopping experience and their sales will increase.

Banking/Crediting
Data mining can assist financial institutions in areas such as credit reporting and loan information.

For example, by examining previous customers with similar attributes, a bank can estimated the level of risk associated with each given loan. If the bank came out with a new credit card – they can decide who will be the likelihood of their users. Marketers can use the information of the data mining, to call up or send letter to existing customer to promote the new product (credit card).

It also help the credit card issuers reduce their losses – as the data mining can assist credit cart issuers in detecting potentially fraudulent credit card transaction. These types of customers may be rejected by the bank when they apply bank loan. As the database keep their records/history/transactions.

Law enforcement
Data mining can aid law enforcers in identifying criminal suspects as well as apprehending these criminals by examining trends in location, crime type, habit, and other patterns of behaviours.

Researchers
Data mining can assist researchers by speeding up their data analyzing process; thus allowing them more time to work on other projects.

Disadvantages:

Privacy Issues
Personal privacy is the major concern. Information will be captured when you do online transactions. They afraid that somebody may have access to their personal information and the use that information in an unethical way – using your IP address to send out virus/worms , or steal your credit card no to buy things online.

Security Issues
Customer information is stored in the database. This information might have the possibility being hack. It will show that companies are not taking care of the information properly.

Misuse of information/inaccurate information
The company may share the information with the other companies. Eg. Your information is captured in the bank. The bank might share your information to the other banks or financial institution about your financial status. If you have the bad record with the bank on the use of credit card, you will have the difficulty to apply a new credit card with other banks. As they already have your bad record and are unlikely your application will be approved.

Source: http://cseserv.engr.scu.edu/StudentWebPages/hchhay/hchhay_FinalPaper.htm

5 Responses so far »

  1. 1

    Let’s be clear about the terminology: “data mining” is a sophisticated statistical analysis, not simple data gathering. All of the “disadvantages” you have listed deal with the inappropriate gathering, joining or sharing of data. Whether or not that data undergoes a sophisticated analysis (data mining) is incidental and irrelevant: even without data mining, these disadvantages would exist. I submit that the disadvantages you list, to the extent that they really are threats, are threats which arise from the construction and sharing of databases, not data mining.

  2. 2

    Dean Abbott said,

    The article cited uses the term “Pattern Warehouse” the next level step beyond data mining, and a technique that began in the late 90s. First of all, I’ve never heard of a “Pattern Warehouse”. So I did a search (with clusty.com, my favorite search engine) and didn’t find anything of substance. Is this a datawarehouse-centric view of data mining?

    The advantages listed in the article are fine, but the disadvantages, as Will indicates, are not related to data mining, but rather data joining (or gathering). I’d say that disadvantages of data mining are (in no particular order)
    * lack of model validation could yield misleading or blantantly incorrect results.
    * data as found naturally in data may not be suitable for predictive modeling, and therefore may produce poor models. Often, one needs substantial data preparation and feature creation to build models
    * missing data can be at a minimum problematic, and at worst devastating to data mining models

  3. 3

    ideahead said,

    If I wanted to data mine a feed such as UPI wire services, how could I set this up?

  4. 4

    ideahead said,

    I would like to do it graphically, similarly to:

    http://www.tenbyten.org/10×10.html

    A wonderful project by Jonathan J. Harris

  5. 5

    Sandro said,

    I agree with both Will and Dean about drawbacks of data mining. I just want to add that, according to me, two majors limitations of data mining are:

    - Users should know common traps in data mining
    - It is not yet an automated procedure, although some people argue that it can be.


Comment RSS · TrackBack URI

Say your words