Author Topic: OT - Data Mining?  (Read 922 times)

jemagee

  • Guest
OT - Data Mining?
« on: September 23, 2008, 01:40:42 PM »
So it seems like there's a lot of tech oriented people here so I thought I'd ask about data mining.  We have a web database, we have a sales database for other sales (separately from the web) and a variety of customers, sales channels, and flavors and all this data available to me since I can write the SQL queries (the basic ones at least) to get the information out, but at some times I feel overwhelmed at what I'm looking at and don't know how to figure out what I 'should' be looking at.  Is that what data mining is?  Is there a good data mining tutorial out there if that is what it is?

Just curios

Offline westkoast

  • Hero Member
  • *****
  • Posts: 8624
    • View Profile
    • Email
Re: OT - Data Mining?
« Reply #1 on: September 23, 2008, 04:33:52 PM »
I only know of data mining in the sense of tracking cookies and other ways to watch how someone surfs the internet.  That of course is done through adware and is normally followed by pop ups.
http://I-Really-Shouldn't-Put-A-Link-To-A-Blog-I-Dont-Even-Update.com

jemagee

  • Guest
Re: OT - Data Mining?
« Reply #2 on: September 23, 2008, 04:36:30 PM »
I only know of data mining in the sense of tracking cookies and other ways to watch how someone surfs the internet.  That of course is done through adware and is normally followed by pop ups.
Cookies have noninvasive uses as well, they are an incredibly useful tool - if you stay logged in on phillyarena, it's probably because of a cookie, but like all things, they have their abusers, and the media blows it out of proportion so the average user is terrified of cookies as they think they'll somehow steal their systems...even though cookies have limited functionality :)

My lay understanding of data mining is that it's the 'noveau' term for data analysis of statistical relevance based on databases of information...i think it's an old idea with a new name because of the advent of the internet and all these databases and available information....

i hate knowing 'a little' about a lot of things but having no passion towards anything to get the indepth knowledge :)

Offline WayOutWest

  • Hero Member
  • *****
  • Posts: 7411
    • View Profile
Re: OT - Data Mining?
« Reply #3 on: September 23, 2008, 06:34:21 PM »
I've heard of data mining tools for over a decade.  Not sure what the true definition of it is but my experience has been that it's used as a "filter".  A ton of the work I've done on what's been called "data minning" projects in my feild of work has involved tons and tons of SQL queries.  Some simple and some extreemly complex.

The basics are that it's a GUI, nothing but button presses and pull down menus, more complicated tools like a query builder are ok but sort of defeats the purpose.  So based on some pull down menus and maybe even a few logic parameters (greater than, equal too, etc..) you have a basic Data Mining front end.  The queries behind the button presses is where the bulk of the heavy lifting is done.

I have a plant being monitored and controlled by as SCADA (Supervisory Control And Data Aquisiton) system.  This system collects dozens of samples per second on hundres, or thousands, of I/O points (temps, on/off status, flow rates, levels, etc...).  So it's possible that you have tens/hundreds of thousands or even millions of data points in the system, 99% of which are useless to 99% of the users.

Simple stuff: Pump starts.  Just run a simple query of Off to On transitions of a discrete (on/off) input point within a given time frame.

A bit more complex would be to determine the volumetric flow rate of the gasoline the pump is moving.   You have to query various I/O points, that are a mix and match of analog and digital points, and take the results to calculate a volumetric flow rate.  Then you probably trend it over a given timeframe.  It would be easy if you had a volumetric flow meter but typically you only have a flow rate.  You then have to take into account things like density, specific gravity, temp., etc...

I've done some even more complex queries when I was looking at a data structure that is 3 dimensional.  It was used for scheduling production runs of cetain types of machines for a sprinkler factory.

This "automation industry" type of data mining methods I've used can be applied to several applications using all kinds of data.  You are basically taking in massive amounts of data and filtering out what you want.  Sounds like your data will be retrieved from several different db sources.  That will create a variety of complexity in your queries.   

The easiest way to attack a problem like this is to ask "what is it you want to see/know".  You need very specific criteria and data points to identify.  Then you just isolate the points you need and query away.  Now if your talking about things like "trends" of somewhat "vague" variables then you need to work on trimming it down to specifics otherwise it will become "overwhelming" as you stated.  I've heard "I want to know what the plant is doing".  Ok WTF does that mean?  Are you looking for production rates? resource consumption? idle time? WHAT?!?! Break it down to quantifiable and manageble tasks/goals and just start knocking them down one at a time.
« Last Edit: September 24, 2008, 08:42:15 AM by WayOutWest »
"History shouldn't be a mystery"
"Our story is real history"
"Not his story"

"My people's culture was strong, it was pure"
"And if not for that white greed"
"It would've endured"

"Laker hate causes blindness"