Data Analysis Types for Different Data Type Sources

I wanted to list the different types of data analysis terms used with different data types sources. 

This list was adapted from the course, “The Data Scientist’s Toolbox”, by Jeff Leek, Phd, Brian Caffo, Phd, Rodger D. Peng, Phd. I am listing here for a reminder with my future research. 

Converting MARC Records to RDF

Libraries store their bibliographic records as MARC (MAchine Readable Catalog). It is the digitized form of the catalog cards that were used to link subjects, authors, and titles to their locations within the library.

A colleague of mine wanted me to remind him on how to use MarcEdit to convert MARC records to RDF. I thought I’ll post the instructions here to share with everyone.

First you will have to download and install the software. You can download the software here. On the same page, you can download other XSLT to help transform other formats to other. Although I believe the XSLT is already installed with the software package, you should download it from here as well. Under XSLT download bullet, select MARC => RDF Dublin Core.

You can view the author’s instructions here. View and follow the instructions. To summarize…. you download the XSLT sheet and save it to a XSLT directory of the MarcEdit installation.

I’ve created this video to help illustrate the process from the instructions.

Creating a Project in SAS Content Categorization Studio 12.1

 Creating a Project in SAS Content Categorization Studio

These steps are for creating a project in SAS’s Content Categorization 12.1. The can be executed from the menu bar path listed in the steps, or you can right click on the project item in the Taxonomy Panel. sasinterface

  1. Start the SAS CCS
  2. File -> New Project
  3. Project -> Add Language
  4. (select your language) then Press OK
  5. Project -> Enable Categorization
  6. Category -> Add Category


You add concepts or categories below the Top concept node. You build out your taxonomy for your concepts there.  


Day 3 – Communities and Culture Research in Digital Libraries

My goal is to post something everyday. For the third day, I am making a reference to a call for students and recent graduates to present their research relating to communities and cultures for digital libraries. The Special Interest Group for Digital Libraries (SIG-DL) from the Association for Information Science and Technology (ASIS&T) is seeking proposals from students to showcase their work at ASIS&T’s annual meeting in Seattle, WA this year Oct 31, – November 4, 2014. Call for Proposal Description

This topic is especially interesting to me because I just finished up some research for two courses over the summer dealing with culture assessment and fostering communities of practices. The research included social network and semantic analysis on a forum. I used the Ontolog group.

The analysis I performed was based on the participation of the users and the languages used in the forum postings. The outcome of the research revealed assigned users to the 1%, 9%, and the 90% based on his or her participation. Culture assessment used semantic analysis extract terms that were used to identify the group members willingness to share with the rest of the community.  

Due to some issues on data mining, I was not able to link all users in a social network based on what forum or reply to forum. Social network analysis of this can yield which users are posting and connecting with others in community. Another issue was analyzing the text due to previous forum message posting. In forums, the original posting is still embedded in the message, so content is being analyzed again on message that are replying to the original, causing duplication in results. 

Future research will try to retain the relationship based on which forum users are posting too, so we can see any patterns to what or to who content is being contributed to. As far for the culture assessment, pronouns can extracted and counted by message to identify the context users are viewing themselves when they participation. 

Much more research is needed, and I might actually proposal to show this research at the conference, but I am posting here to document the conference and my research. Tools I used for that research include C#’s Html Agility Pack, NodeXL, and SAS Content Categorization Studio. 

Day 2 – My Semantic Research

I just want to express my motivation for the direction of this blog. The purpose of this blog was just a public platform to speak out and develop my ideas. It is also a convenient place to retrieve links and resources for my research.

How will the content relate to me. I will be exploring different forms of semantic technologies for processing and extracting metadata from text.  The content will relate to the research I will be performing in my final semester as a Masters in Science of Information Architecture and Knowledge Management (IAKM) student from the School of Library and Information Science (SLIS) from Kent State University (KSU).

My concentration is knowledge management, and my research interest is in semantic technologies. This fall I will be filling the role as a Graduate/Research Assistant to Dr. Bedford, the Goodyear Professor for Kent State, and performing semantic analysis methods and techniques for her and her research.

Research relating to her various considerable across the knowledge management discipline. I will being posting links and related research for her as well as research from my course work.

In addition, during this fall, I will be study for the GRE for my application to doctoral school. I don’t know if this is the path for me, but I am pursuing it. Almost all doctoral schools applications require a GRE score. Moreover, I believe the three main principles of reasoning, writing, and data analysis is important foundations for users to his or him abilities to plan, implement, and operate a decision support system in a the context of a knowledge organization.

I hope you enjoy this journey with me. Any participation will be appreciated.