Workflow: From Spreadsheet to Islandora

From FIG

Jump to: navigation, search
Navigation
NewFig1201.png
FL-Islandora Guides (FIG)
FL-Islandora Overview
User Interface
Permissions and users
Collection Creation and Management
Content models
Creating Content Objects
Metadata
Suppressing Objects From View
PALMM guidelines
Fl-Islandora and Mango
Site Administration
Workflow: From Spreadsheet to Islandora
Using Google Analytics with FL-Islandora
Collection Information Menu
A-I. FL-Islandora Glossary
A-II. Field Inventory

These instructions cover transforming an Excel spreadsheet of metadata to MODS so the records can be imported into Islandora along with corresponding content files using the Zip File Importer.

The outline of steps is:

a) Prepare your Excel spreadsheet of metadata
b) Use FLVC’s ExcelToMODS Transformer to convert your Excel spreadsheet to MODS
c) Import the MODS and corresponding content files with Zip File Importer


Contents

Prepare your Excel spreadsheet of metadata

Download the template Excel spreadsheet from http://exceltomods.flvc.org/docs/metadata_template.xlsx . Download this and type in your filenames and your metadata values. Only the first three columns (filename, identifier-IID, title, and date issued) are required. Other fields are optional, however, good complete metadata will help researchers and site visitors to find your items and is recommended.

Some extra instructions on which column header matches to which MODS field can be found here http://exceltomods.flvc.org/docs/mods_template_documentation.xlsx .

One content model per spreadsheet! (ie. Large Image .tifs go in a spreadsheet together, and the Basic Image .jpegs go in a different spreadsheet, and PDF content model .pdf files go in another, and so on.) When you use the ZIP File Importer to add MODS and content to Islandora, you can only zip load one kind of content model at a time. So all of the rows of your worksheet should be for the same type of file -- don’t mix up your PDFs and JP2s!

Tip: All columns in your Excel spreadsheet should be of the format “Text” (not “General”) before you begin data entry. This helps to prevent Excel from reformatting your metadata as you type. If you notice that Excel is reformatting your dates, select the whole column, right click, select "Format" in the dropdown menu, and make sure that "Text" format is selected.

Use FLVC’s ExcelToMODS Transformer to convert your Excel spreadsheet to MODS

Note: do NOT use Internet Explorer with http://exceltomods.flvc.org/ as it doesn't properly display transformation errors.

FLVC’s Excel to MODS Transformer is at http://exceltomods.flvc.org/ . For a login, contact the FLVC Help Desk [1].

After logging in, you will see a place to upload an Excel file. Browse your computer harddrive, find the Excel file of metadata you prepped. To the right of the upload slot, you will see a blank for “Output Filename”. There you can fill out the name you would like the ExcelToMODS Transformer to assign to your package of MODS files. If you leave it blank, then the transformer will assign a random name. To convert the Excel file to MODS, click the “transform” button.

TransformerBrowseFile.png

After you click the “transform” button, a green bar with the text “Submitted” will appear. In a few moments, in the “Downloads” area, either a list of error messages or a link to your package of MODS files will appear.

If you see error messages in the “Transformation Errors” section, open your Excel file on your desktop and work through the errors. When you have corrected an error, you may click in the ExcelToMODS Transformer to clear the error messages, and try uploading and transforming again. It’s a good idea to click to clear the error messages between uploads, because then you won’t see and revisit older errors that you have now fixed. If you do not click to clear older error messages, they will remain on the screen even after you upload a new file.

When you have worked through error messages. Click the “download” button to download your package of MODS files.

TransformerDownload.png

When you download the package, it will be a .zip file containing a MODS file in xml for each line in your original spreadsheet. You can then save the .zip file to your local drive. (When you click the link in your browser, the zip file may open automatically in Internet Explorer, and then you'll see a list of .xml file, which you can click and drag to a folder on your computer.)

Quality Control: Manually Review the MODS

The Excel to MODS Transformer validates the MODS files it creates. However, there are cases where although the MODS XML file is valid the contents of individual MODS elements may not be what you expect to see. This is usually caused by irregularities of data entry into the Excel file, or by Excel "helpfully" reformatting something.

Click to open a few samples of MODS XML records and skim them to confirm that the contents of elements is what you expect. Common element content problems and solutions are described below.

Problem: Dates are formatted as yyyy.0 . Ex. 2015.0 Ex. 1988.0

Cause: This has likely been caused when dates are input into an Excel column that has not been pre-formatted as "text".

Solution: In the Excel file: Cut and paste just the column of problematic numbers into Notepad. Then in Excel, format the column as text: Select the whole column, right click and "Format Cells", then choose Category "Text". Now cut and paste from Notepad back into Excel. When the date is correctly formatted as text, I see a tiny green corner in the top left of each of the Excel cells holding a yyyy date. The green arrow is an error message. You can then click the cell with the green arrow, and an exclamation point appears to the left of the cell. Click the exclamation point to see the error message, and it says “Number stored as text”. For you this is not an error, but instead exactly what you wanted to do!

Problem: Dates show random numbers

Cause: Excel has formatted your date field as a "Date" and not as "Text". It seems counterintuitive, but you want your date info to be stored as "Text" in Excel. Microsoft Excel stores dates as sequential numbers that are called serial values. For example, in Microsoft Excel for Windows, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,448 days after January 1, 1900.

Solution: Type all your dates into the spreadsheet as yyyy-mm-dd, and if you realize that Excel is reformatting these to mm/dd/yyyy, nip it in the bud. Select your whole column, right click and "Format Cells", then choose the "Text" format.

Solution: If Excel has already converted a whole bunch of dates to yyyy/mm/dd format, you can convert to yyyy-mm-dd by selecting those and formatting as date with a locale of "English (U.K.) and a Type of yyyy-mm-dd. Then you can can copy and paste all those to Notepad. Then in Excel, format the column as text: Select the whole column, right click and "Format Cells", then choose Category "Text". Now cut and paste from Notepad back into Excel. When the date is correctly formatted as text, I see a tiny green corner in the top left of each of the Excel cells holding a yyyy date. The green arrow is an error message. You can then click the cell with the green arrow, and an exclamation point appears to the left of the cell. Click the exclamation point to see the error message, and it says “Number stored as text”. For you this is not an error, but instead exactly what you wanted to do!

Problem: Line breaks within individual Excel cells cause unintended repeated MODS elements

Cause: This is caused by line breaks in Excel cells, possibly introduced via a cut and paste operation. An example of a resulting repeated MODS element will look something like this:

    <note>Part of the sentence that came before the line break,</note>
    <note> and the rest of the sentence that came after the line break.</note>

If the above is loaded into Islandora two <note> elements will display to the public.

Solution: To remove line breaks in Excel cells, do find/replace. For find you will enter the line break which looks invisible. To enter the line break, have the cursor in the “Find” part of find/replace, hold down Alt and type 010. Then for replace enter a single space. That find/replace should remove all the line breaks in cells and add spaces instead.

Make a .zip file with the MODS XML and corresponding content files

Put the unzipped MODS records and the corresponding content files into the same directory. The name of each content file, and of each .xml file should be identical, except for the .xml and .pdf/.jpg/.tif/etc part. When you upload, Isladora will use the filenames to match the XML to the file.

Right-click on the directory, then click “Send to” and “Compressed (zipped) folder”.

TransformerCreateZip.png

Windows will ask you for the .zip file name. Save the .zip file and then follow the steps in the Islandora User Manual: [[2]], section “Zip File Importer” to load the .zip file of MODS and content files into Islandora.

NOTE: the ZIP File Loader allows for upload/creation of objects in only one Content Model per .zip file.

Import the MODS files as a metadata-only load into Islandora with ZIP File Importer

You can also loads MODS files into Islandora without accompanying content files. Use cases for this workflow:

  • You want to create "parent" Compound Object objects. (The Compound Object Content Model accepts only metadata. The "child" objects are loaded separately into an appropriate Content Model and then linked to the parent object using the "Compound" function.)
  • You want to upload Book Content Model objects. Because these have multiple files - one for each page - with different file names, Islandora can't match the files to the xml. You can batch load the metadata, then come along and add pages to each Book Content Model object.
  • Your institution's workflow is such that one staff person creates and loads metadata only, and after the metadata has been finalized another staff person uploads and adds content datastreams.

History note

The Excel to MODS Transformer is periodically updated, for example to have a header referencing a more recent version of MODS. Updates are backwards compatible, and it's OK to reprocess older Excel files made from a previous template.

Prior to 2015, FSU library had circulated an Excel template, an XSLT template, and instructions for converting from Excel to MODS using Microsoft Excel on the desktop. That process is no longer supported by FLVC.

As of Summer 2018, no additional fields will be added to the Excel to MODS Transformer. The service will not be sunset, but all development work will instead be spent on Islandora.

Personal tools