LEADS Blog

Minh Pham, Week 1- Exploring the data

 

Week 1: Exploring the data

My placement is with the Repository Analytics & Metrics Portal (RAMP) project at Montana State University. Nikolaus – another LEAdS fellow in the same project with me provided a nice overview of the project. Thanks, Nikolaus!

 

Before the bootcamp, Nikolaus and I had an online meeting with our mentor – Dr. Kenning Atlitsch and other members in the project. Dr. Atlitsch and the other members in the project helped us understand more about the project and familiarized us with the data collected from the RAMP service. Thanks to the bootcamp, I came home filled with new knowledge about library science in general and meta data in particular and new techniques in database management, visualization, and analysis with text mining and machine learning methods.

 

For week 1, I focused on exploring the data by doing descriptive analysis and creating crude visualizations from the data. RAMP data consists numbers from over 50 IRs and consists over 400 million rows. Due to the amount of data and memory constraints of my laptop, it takes R from a couple of minutes to hours to run a command or knit the document. I looked into the option of working with R Studio Cloud but the current version of R Studio Cloud does not enable us to upload and work with such big data like RAMP. For now, I have to use the old school way of handing generated results from R: copying and pasting one by one to a word doc rather than make use of knitting capabilities of all results in a single document using R notebook or markdown.

 

My plan for the 2nd week is to refine the visualization for aesthetics and readability and merge RAMP data with other data to explore research possibilities from the RAMP data.

 

Minh Pham



1 thought on “Minh Pham, Week 1- Exploring the data”

  1. Wow, this is great, Minh! I’m excited to see where this project goes! I’m glad to hear that you and Nikolaus have been able to make some great progress on this so far. Feel free to reach out to Jake Williams or the other LEADS PIs about how you might be able to work with this big dataset a little easier. I know he’d be happy to help!

Leave a Reply

Your email address will not be published. Required fields are marked *