Case Study: University of Alberta

Background:

After digitizing and OCRing the newspapers, the University of Alberta came to Access Innovations, Inc. for a proof of concept to identify a taxonomy that would best fit the University of Alberta’s news data. The data was in both French and English which was another consideration when choosing a taxonomy for indexing. Three taxonomies were tested against Alberta’s data: a news-centered taxonomy, a geography-centered taxonomy, and JSTOR’s taxonomy, which covers a wide range of topics. After extensive analysis of indexing results, the news taxonomy was chosen as the best fit. Once the proof of concept was fulfilled, a subset of Alberta’s data, which was focused on the early 1900’s influenza outbreak, was indexed and analyzed to determine if the indexing accurately covered this historical event.

Need:

The University of Alberta had an archive of old news articles that they wanted to sort, analyze, and search more effectively.

Solution:

The University of Alberta found that the news-centered taxonomy worked best for their archive after testing it against two other options. Some changes were made to the algorithms, and another indexing run was conducted to include more terms. They also received metrics and analyses of their data, including term frequencies and comparisons between the taxonomies.

Results:

We also created a Canadian geography taxonomy for the University of Alberta to tag location data in their news articles. The finished files were delivered back to the university and can now be used by students, faculty, and researchers.