Sunday, January 25, 2015

Free Five Course Series on QGIS Starts Soon

Del Mar College is offering a free online course that gives an introduction to GIS and QGIS.  The course is titled "Introduction to Geospatial Technology Using QGIS" and is available from the Canvas Network.  The five-week course is self-paced and runs from February 23rd to March 27th. Already 1,000 students are signed up. The courses were created with funds from the National Science Foundation (NSF) and US Department of Labor.

It is great to see a course geared towards QGIS!  The course hits on components of the core competencies for entry level geospatial occupations as outlined here.  It includes lectures and hands-on exercises.  If you can't wait or can't find time, the course materials for this course and others are available at GitHub: https://github.com/FOSS4GAcademy.

From the OSGeo listserve, this course is part of a larger effort to educate about GIS and FOSS GIS called "Geo For All" and GeoAcademy.  This is the first course in a sequence of five, so more courses will be on the way!

The other four courses are:
  • Spatial Analysis Using QGIS
  • Data Acquisition and Management Using QGIS
  • Cartography Using QGIS
  • Remote Sensing Using QGIS
If you know someone who is interested in GIS or QGIS and likes independent study, it looks like a solid opportunity!  
For more information: https://www.canvas.net/browse/delmarcollege/courses/cn-1681-intro-qgis

Course offerings from Penn State University (PSU) and Coursera: 
http://opensourcegisblog.blogspot.com/2014/08/free-online-mapping-classes-from-psu.html

Monday, January 19, 2015

Using R to Prepare a Case File for SatScan

SaTScan requires several different types of files for analysis: 1)  A case file with a column for the geographic unit. day, month or year (see documentation), and number of cases.  You can aggregate the data into any geographic unit--large or small. 2) A geographic coordinate file (cartesian or lat/long) with the name of the unit (i.e. census tract), x and y for centroids of the geographic units, and 3) population file with the estimated population over the time period-- by year.

In this post, I will describe creating a case file using code in R.  The goal is to create a sum of homicides by month, year (just 2013 for this example), and police beat/post.  We won't worry about any other specifics (i.e. degree) or related types of crimes, i.e. shootings.

To ready yourself for data preparation, read Richard Block's tutorial or the more extensive SatScan manual.

I use crime data from Chicago's Open Data Portal.  The same code can be applied to other types of data, health data, etc.  A few key points: 1) the data contains victim-based data--which we want to convert into incidents. 2) not every post has a homicide, and 3) the reference post list contains 275 post.  So, we will end up with a data set with 3300 rows (275 x 12 months) or simply a row for each post-month.

If you want to skip ahead and just look at the code, go to: http://goo.gl/pmOi1u.


At the top: What you start with.  Bottom: After processing in R

Overview of Steps: See the code for further details

Step #1: Two files are imported: 1) a victim-based file of all crimes, which is narrowed down to just homicides (you could also add in shootings) and 2) a 'reference' file or simply a list of the police beats/posts in Chicago.

Step #2:  The data are summed up so that each row contains the total number of victims, then grouped again into incidents by using two different count variables.

Step #3: The list of police beats get column variables for each month in the year and expanded by reshaping data from wide to long.  This serves as a 'reference list' for matching purposes.

Step #4:  The two data sets are matched the 'unmatched' records are also kept.  These are post-months that don't have a homicide, so each count value is replaced with a zero.

Step #5: To ensure the code has worked, I check the total number of rows (3300) and spot check various posts to make sure the data has been grouped in to incidents and posts correctly.  

Whether in R or using for-fee software (i.e. SAS, STATA), preparing data for SaTScan is relatively straightforward but there are a number of steps.

Update #1 (2/18/15)
Scan statistics can also be implemented in R's Spatial Epi Package and rsatscan .

Wednesday, January 7, 2015

FGBASE: Fast Grid-Based Spatial Data Mining

FGBASE is a new open source software for using scan statistics on gridded data.  Unlike SaTScan, FGBASE only currently runs on Mac OS X (10.6, 10.7, and 10.8) instead of Windows and also allows for its source code to be downloaded here: http://www.fgbase.org/download-fgbase/.  The software was specifically created for environmental epidemiology but has potential applications to any fields of study concerned with finding clusters.

Analyzing aggregate data, using either software package, helps to speed up computationally intensive equations for finding spatial, temporal, or spatiotemporal clusters.

Comparison of FGBASE and SaTScan


FGBASE
SaTScan
Operating system(s)
Mac OS X
Windows, Linux,
Mac OS X
Open source code
Yes
No
Geographic output
In app
New: Export to KML or SHP
Sample data sets
Yes, 1
Yes, several
Documentation
TBD
Extensive
Publications
1
Extensive, hundreds
           
Although FGBASE comes with some sample data (available at: http://www.fgbase.org/user-data/), the program was only recently released.  Aside: The data set is different from the one used in the published paper, so you will notice differences when looking at your screen.  What data sets you will need and how they are structured is available at: http://www.fgbase.org/user-data/.

Clusters can be examined using a data-driven approach answering the question: where are the clusters?  Or, a hypothesis-driven approach can be used: are there clusters relative to a source(s) of exposure, where entities (factories,etc.) may be responsible for the clustering of cases.

A stock screenshot of FGBASE. Source: IJHG
I downloaded and installed FGBASE.  I will check back in with more impressions in a few months. Adding documentation, with a tutorial, or even a short YouTube video could greatly aid users.  I also plan to blog about getting data into SatScan and interpreting results later in the year.  Since FGBASE's source code is public, hopefully this will speed further development of the program and aid troubleshooting.

Read more at the International Journal of Health Geographics:
http://www.ij-healthgeographics.com/content/pdf/1476-072X-13-46.pdf

See also:
Treescan
R: Spatial Epi Package
There is also an experimental SaTSViz plugin in QGIS but I have not had a chance to look at.