Managing PANGAEA Research Data, with Visualization Methods for Ocean Data
Abstract: In this
exercise you'll find and explore the PANGAEA database (which should
still be called the World Database of Environmental Research Data).
Due to massive effort by the publishers, it contains thousands of
research measurements, carefully digitized into standard formats, and
fully indexed by robust metadata. After finding and obtaining data
from a very easy graphical interface, you can use a special application
provided by PANGAEA to convert the data to the Ocean Data View (ODV)
software format. The conversion is so easy that once done the new
file opens immediately as an ODV collection. This allows users to
pursue and gather needed parameters not typically found in existing
Preliminary Reading (in
OceanTeacher, unless otherwise indicated):
Microsoft Visual C++ 2010
Redistributable Packages - 64- or 32-bit auxiliary programs to
Pan2Applic - Special
program to convert single or aggregated PANGAEA files to Ocean Data View
PANGAEA - "An
Open Access library aimed at archiving, publishing and distributing
georeferenced data from earth system research." [Website]. In
basic terms, the publishers have undertaken the monumental (and
admirable) task of digging into thousands of scientific research
publications to find the original data tables from in situ sampling and
measurements, and have organized them into their own standard format.
Digital Object Identifiier (DOI) - Permanent character string (a
"digital identifier") used to uniquely identify an object such as an
electronic document (or more recently a datafile).
install the Pan2Applic software on your computer. The website pages
about installation are hopelessly complex, so just follow the directions on
the MDL software page (above link).
|2. Open the main PANGAEA
webpage, and take some time to read about this amazing project. In
essence, they are working to retrieve and digitize huge amounts of
environmental data that has gone into research, but never published formally
for use by the world.
You can see from the top-level categories how big
this job will be. It is to the credit of the authors that they have
done a magnificent job so far in bringing tons of data back from the grave.
Click in the ADVANCED SEARCH control to continue.
|3. In the SEARCH TERMS part of the
search page, leave the ENVIRONMENT set to ALL, then you sometimes get
an incomplete search listing, so make sure to select WATER. For
PARAMETER we've specified primary productivity, and extremely important
variable that is not included in the World Ocean Database yet.
HELP - Click here any time to see search options. There does
not appear to be a data dictionary in play here, so be flexible and creative
in your searches.
|4. You can explore the
temporal search function later on your own. For now, just leave the
|5. In the GEOGRAPHIC COVERAGE
part of the search page, you have several options for selecting an area.
|6. We'll take the easiest
route, and simply enter the limits of the Liberia area (carefully including
the negative signs where needed). Then click on the "compass" symbol
in the midst of the 4 geographic limits. This zooms you into the area,
and draws a nice rectangle.
|7. Click on SEARCH (either
top or bottom of the page), and after a short wait you'll get a report like
this if there are any corresponding data. We have access now to 24
There is an interesting message that tell us how PANGAEA breaks
all multi-word search tokens into separate parts, and uses them as if they
included an "OR" token.
Take the time to scan through the 24 listings, just to get an idea of the
types of original sources. Many (most?) of them do not have "primary"
or "productivity" in the titles, so considerable examination and careful
indexing has been performed by the PANGAEA folks.
|8. At the very
top-right of the report, click on SHOW MAP. This really well-done
chart appears. Take the time to read about the symbols used
(lower-right corner) and click on the objects to see where they came from.
|9. For an example, click on
one of the small row of sites along the Liberia coast, to see the title of
the data.. Morel, Behrenfeld and Falkowski are some of the greatest
names in ocean productivity, so these are probably very fine data. But
these data points are for remote sensing data, which we don't need right
|10. Click on one of the
aggregate symbols, and you'll see this report from in situ measurements.
Note that there is a DOI number for these data.
|11. Click on the
DOI to bring up the full text of its metadata. This may be the longest
list of authors we've ever seen, but this is perfectly OK due to the
collegial nature of complex ocean survey measurements.
|12. Look closely at the very
bottom left corner, and you'll find a DOWNLOAD DATASET link. Click
this link to obtain these data as a tab-delimited, ASCII text file.
|13. Save the file
in the folder DATA > OCEAN > PANGAEA > WORK with the filename
PV24_surface_prime_prod_pangaea_684809.tab. That's the original
filename from PANGAEA plus a little more added by this author from the DOI
Everything in the WORK folder will be merged and analyzed below, so make
sure you have removed any data files from previous PANGAEA work.
|14. Go back to the map, and
find another aggregate data symbol nearby, from the same research group, and
save it in the same way.
|15. While you are saving it,
make sure that the data report shows the variable you really want.
[You can ignore the others.] This list does include primary
productivity, so we should go ahead and get these data.
|16. Save these
data in the folder DATA > OCEAN > PANGAEA > WORK with the filename
PV24_phys_oce_pigments_PP_pangaea_756636.tab. This is also the
author's invention, based on the original PANGAEA filename and the DOI.
|17. Now we're
ready to merge these data and convert them to Ocean Data View format.
If you read through the Pan2Applic instructions, you'll find a number of
different routes from PANGAEA to ODV, different from the method shown below.
Please explore them all and select the method that works best for you.
|18. Run Pan2Applic.
Select FILE > SELECT FOLDER.
|19. Navigate to your WORK
folder, and select CHOOSE to make sure this is where Pan2Applic will do its
|20. Make sure you can see
your files there.
|21. Go back to the main menu
of Pan2Applic and select CONVERT > OCEAN DATA VIEW.
NOTE: You have
other choices of interest, including shapefiles. Remember that if you
need to add station locations to your own project map.
|22. This rather frightening
page opens. By trial-and-error (my favorite method) this author has
gotten excellent results with the settings you see here. You are
invited to contact me and set me straight if you see a better way.
- Make sure that for OUTPUT FILE, you use BROWSE control to set the
location to the DATA > OCEAN > PANGAEA folder.
- Make sure that for PROGRAM, you use the BROWSE control to locate the
ODV4.EXE file in your most current version of Ocean Data View.
- Make sure START OCEAN DATA VIEW is checked.
Click OK, and the conversion process will begin.
|23. Just click OK.
24. Now you can see that PANGAEA has gone through both
datafiles, and combined their field structures into a single list. The
upper items (indicated by 1:) are from the metadata for the tables, holding information about the circumstances of the collection.
The lower items (indicated by 2:) are the actual data value labels.
Ctrl-Click to select the data items shown here. Then use the ==>
control to move them to the target area on the right. Very obviously
this is oversimplifying some matters that you should investigate on your own
later. For now, click on OK to create the combined new table file
|24. Words to the
wise. We know from experience that Pan2Applic will aggregate all the
station data into a single data file with the name WORK.ODV.TXT, and
into an ODV collection with the name WORK.ODV, so you might want to
add the AREA, CAMPAIGN or EVENT fields when you are working with your own
data archeology, so that the resulting collection can be examined more
critically than you can do with a simple 1-file aggregation.
In fact, you'll probably need all
the data, eventually, but the author is just taking the shortest route here.
|25. Amazingly, because we
checked it above, ODV immediately opens with the new data table already
imported into a new collection, named as above. This is entirely
familiar territory to MDL students, so you can take it from here.
|26. Here's a map of the new
collection's stations, from ODV's map mode.
|27. And here's a scatter plot
of the primary productivity values. The pattern is entirely normal
looking and clean.
|28. To understand better
exactly what happens during this process, here are separate collections
created from DOI 765636 (above) and DOI 684809 (below). They both
contain surface values, but only the upper collection has values at depth.
The total number of samples is coincidentally 48 in both cases,
typically makes the new ODV collection in the folder just above the WORK
folder, with a very generic name. You might want to take a minute in
ODV to use the command COLLECTION > RENAME to give it a more useful name,
similar to the filenames used above.
|30. PANGAEA is an
immensely valuable resource that will tax your ability to keep up with the
new possibilities it offers. You should examine its existing data
very carefully before you spend time and money looking at things that may
already be "out there." We owe a huge debt of gratitude to the gnomes,
elves and graduate students who have done such a magnificent job.