English 中文(简体)
Plotting Histogram: How can I do it from scratch using data stored in a database?
原标题:

I have some data stored in a database like this:

TableName: faults Table:

+------------+--------------+
| fault_type | total        |
+------------+--------------+
|    1       |            1 | 
|    2       |            3 | 
|    3       |            8 | 
|    4       |            2 | 
.............................

How am I supposed to get a histogram plot starting from this table?

最佳回答

The solution below assumes that you have MySQL, Python and GNUPlot. The specific details can be fine tuned if necessary. Posting it so that it could be a baseline for other peers.

Step #1: Decide the type of graph.

If it is a frequency plot of some kind, then a simple SQL query should do the trick:

select total, count(total) from faults GROUP BY total;

If you need to specify bin sizes, then proceed to the next step.

Step #2: Make sure you are able to connect to MySQL using Python. You can use the MySQLdb import to do this.

After that, the python code to generate data for a histogram plot is the following (this was written precisely in 5 minutes so it is very crude):

import MySQLdb

def DumpHistogramData(databaseHost, databaseName, databaseUsername, databasePassword, dataTableName, binsTableName, binSize, histogramDataFilename):
    #Open a file for writing into
    output = open("./" + histogramDataFilename, "w")

    #Connect to the database
    db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
    cursor = db.cursor()

    #Form the query
    sql = """select b.*, count(*) as total 
            FROM """ + binsTableName + """ b 
            LEFT OUTER JOIN """ + dataTableName + """ a 
            ON a.total between b.min AND b.max 
            group by b.min;"""
    cursor.execute(sql)

    #Get the result and print it into a file for further processing
    count = 0;
    while True:
        results = cursor.fetchmany(10000)
        if not results:
            break
        for result in results:
            #print >> output, str(result[0]) + "-" + str(result[1]) + "	" + str(result[2])
    db.close()

def PrepareHistogramBins(databaseHost, databaseName, databaseUsername, databasePassword, binsTableName, maxValue, totalBins):

    #Connect to the database    
    db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
    cursor = db.cursor()

    #Check if the table was already created
    sql = """DROP TABLE IF EXISTS """ + binsTableName
    cursor.execute(sql)

    #Create the table
    sql = """CREATE TABLE """ + binsTableName + """(min int(11), max int(11));"""
    cursor.execute(sql)

    #Calculate the bin size
    binSize = maxValue/totalBins

    #Generate the bin sizes
    for i in range(0, maxValue, binSize):
        if i is 0:
            min = i
            max = i+binSize
        else:
            min = i+1
            max = i+binSize
        sql = """INSERT INTO """ + binsTableName + """(min, max) VALUES(""" + str(min) + """, """ + str(max) + """);"""
        cursor.execute(sql)
    db.close()
    return binSize

binSize = PrepareHistogramBins("localhost", "testing", "root", "", "bins", 5000, 100)
DumpHistogramData("localhost", "testing", "root", "", "faults", "bins", binSize, "histogram")

Step #3: Use GNUPlot to generate the histogram. You can use the following script as a starting point (generates an eps image file):

set terminal postscript eps color lw 2 "Helvetica" 20
set output "output.eps"
set xlabel "XLABEL"
set ylabel "YLABEL"
set title "TITLE"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
set key autotitle columnheader
set xtics rotate by -45
plot "input" using 1:2 with linespoints ls 1

Save the above script into some arbitrary file say, sample.script. Proceed to the next step.

Step #4: Use gnuplot with the above input script to generate an eps file

gnuplot sample.script

Nothing complicated but I figured a couple of bits from this code can be reused. Again, like I said, it is not perfect but you can get the job done :)

Credits:

问题回答

This blog article may help you! It talks about statistic using gnuplot and plot the result into histogram.





相关问题
SQL SubQuery getting particular column

I noticed that there were some threads with similar questions, and I did look through them but did not really get a convincing answer. Here s my question: The subquery below returns a Table with 3 ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

php return a specific row from query

Is it possible in php to return a specific row of data from a mysql query? None of the fetch statements that I ve found return a 2 dimensional array to access specific rows. I want to be able to ...

Character Encodings in PHP and MySQL

Our website was developed with a meta tag set to... <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> This works fine for M-dashes and special quotes, etc. However, I ...

Pagination Strategies for Complex (slow) Datasets

What are some of the strategies being used for pagination of data sets that involve complex queries? count(*) takes ~1.5 sec so we don t want to hit the DB for every page view. Currently there are ~...

Averaging a total in mySQL

My table looks like person_id | car_id | miles ------------------------------ 1 | 1 | 100 1 | 2 | 200 2 | 3 | 1000 2 | 4 | 500 I need to ...

热门标签