Archive for the ‘Machine Learning’ Category

How to ACTUALLY USE a saved model in Weka!

Wednesday, October 25th, 2006

This was a frusterating 30 minutes of searching to find out how to get Weka to do something that really should have been obvious.

In this data mining competition I was trying out, they give you a final test set without the actual answers and your job is to predict that the actual values are, and well, send them in. I did manage to get it figured out after a lot of searching. So I’m back on my path to winning ;-)

So here’s the easiest way to use a model to simply get its predictions without training it again:

If you right-click on the result list you’ll get a popup
with options including “Load model” which allows you to load
some saved model; and there is another option “Re-evaluate
model on the current test-set”; obviously before you can
do that, you will have to load some test-set: tick “Supplied
test set” and then the “Set” button and select your file …

I found that answer here, and later in the thread they also mention a way to do it with the command line interface (CLI).

If that quoted answer above still isn’t clear, allow me to summarize:

1. You open Weka and get that little window with the four buttons.
2. Click on Explorer.
3. load any old bogus data set you’re not going to need just so it allows you to get to the classifier tab.
4. Then under test options you choose “supplied test set” and point it to the data you want it to make predictions on. (Note: this 5. must have all of the same attributes, in the same order that you trained your model on.).
6. Finally right click in the result list area and select load model.
7. Point it to your saved model.
8. Next right click to model you just loaded and tell it to re-evaluate on current test set.

Here’s what your output might look like:
Weka actually using a model to make predictions without retraining OMG

You know, all frusteration aside, Weka really is an incredibly useful, free program, and perhaps I should be glad the developers are spending time on the machine learning side of development instead of my personal needs…

My search terms (see the frusteration build):

  1. weka run a model on test data
  2. weka use a model
  3. weka + “use a model”
  4. weka + (get OR output) + predictions + without + training
  5. weka + “without training”
  6. weka don’t have actual
  7. weka cli
  8. weka + sucks
  9. “weka sucks”

Here is more information about saving models in Weka.

[tags]Weka, machine learning, data mining, predictions[/tags]

How to Run Weka in Linux

Sunday, August 13th, 2006

So you’ve downloaded and installed Weka to your Linux system but when you click on Explorer or Experimenter you get this error:
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException ...

Just follow these steps to fix it:

  1. Create a “LookAndFeel.props” file in your home directory:
    Type:touch ~/LookAndFeel.props into system command line.
  2. Type this into the first line of the newly created LookAndFeel.props file:
    Theme=javax.swing.plaf.metal.MetalLookAndFeel
  3. Save the file and exit.
  4. Start WEKA and everything should now work.


Here’s a detailed version of the answer and explains a little bit of the why of this problem.

Here’s the full text of the original error:

chiefinnovator@MAIN1:~$ cd /home/chiefinnovator/weka/weka-3-4-8a/
chiefinnovator@MAIN1:~/weka/weka-3-4-8a$ java -jar weka.jar
 /usr/share/themes/Simple/gtk-2.0/gtkrc:46: Engine "thinice" is unsupported, ignoring
/usr/share/themes/Simple/gtk-2.0/gtkrc:53: Engine "redmond95" is unsupported, ignoring
/usr/share/themes/Simple/gtk-2.0/gtkrc:57: Engine "redmond95" is unsupported, ignoring
---Registering Weka Editors---
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
        at weka.gui.explorer.PreprocessPanel.addPropertyChangeListener(Unknown Source)
        at javax.swing.plaf.synth.SynthPanelUI.installListeners(SynthPanelUI.java:49)
        at javax.swing.plaf.synth.SynthPanelUI.installUI(SynthPanelUI.java:38)
        at javax.swing.JComponent.setUI(JComponent.java:652)
        at javax.swing.JPanel.setUI(JPanel.java:131)
        at javax.swing.JPanel.updateUI(JPanel.java:104)
        at javax.swing.JPanel.(JPanel.java:64)
        at javax.swing.JPanel.(JPanel.java:87)
        at javax.swing.JPanel.(JPanel.java:95)
        at weka.gui.explorer.PreprocessPanel.(Unknown Source)
        at weka.gui.explorer.Explorer.(Unknown Source)
        at weka.gui.GUIChooser$3.actionPerformed(Unknown Source)
        at java.awt.Button.processActionEvent(Button.java:388)
        at java.awt.Button.processEvent(Button.java:356)
        at java.awt.Component.dispatchEventImpl(Component.java:3955)
        at java.awt.Component.dispatchEvent(Component.java:3803)
        at java.awt.EventQueue.dispatchEvent(EventQueue.java:463)
        at java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThread.java:242)
        at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:163)
        at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:157)        at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:149)        at java.awt.EventDispatchThread.run(EventDispatchThread.java:110)

[tags]weka, ubuntu, java, weka Exception, AWT-EventQueue-0, java.lang.NullPointerException[/tags]

What is a Competitive Learning Network

Sunday, May 28th, 2006

A competitive learning network (CLN) is an unsupervised learning technique that is used to find similarities in examples in a dataset and therefore suggest groupings of the examples. This is known as a clustering algorithm since it “clusters” the data examples together into similar groups. It is called unsupervised because we do not tell the algorithm how to group the examples together. It is left to its own devices to figure out any similarities in the data.
A CLN is made up of two layers of nodes; an input layer and an output layer. Each node in the input layer connects to every node in the output layer. Each connection has a weight assigned to it. The values assigned to these weights are where knowledge is stored in this network. The each node in the input layer is associated with a feature in the dataset. Each node in the output layer corresponds to a potential grouping of the data.
(more…)

When is the KKD Cup for 2006

Tuesday, April 18th, 2006

Siemens Medical is providing a problem, to be posted May 1, 2006.

I wrote to gmelli_sigkdd@predictionworks.com to find out. Hopefully it will be announced on their website soon.

What is the KDD Cup? It’s an annual data mining competition that my company Blended Technologies will be participating in this year.

[tags]Data Mining, Data Mining Competition, KDD Cup[/tags]

How to Tell if a Binary Number is Divisible by Three

Monday, April 3rd, 2006

Answer:
If the number of even bits minus the number of odd bits is a multiple of 3 (e.g. -3,0,3,6, etc) then the number is divisible by three.
(more…)