Data Related Mistakes I Made in Grad School
Everyone makes mistakes in grad school. Some mistakes are arguably bigger than others… like the time that I dropped my NMR tube containing the only 5 miligrams of a compound I had struggled to isolate (yes, I cried). But honestly… a lot of my mistakes stem from poor data management. So without further ado, here are some of the data management mistakes I made in graduate school and some of the resources that have helped me re-organize my data.
Not adopting a file naming system early enough
tldr: Create a simple file naming system that contains important identifiers… and stick to it
Files related to a specific experiment
I was advised early in my graduate school journey to name analytical data files with a specific format: my initials, notebook number in Roman numerals, and a three-digit number with leading zeros as placeholders (e.g., NYH-I-001). This method ensures that when I filter files by name on my computer, they will always be organized in chronological order based on the respective reactions. This approach significantly simplifies managing the substantial amount of data that accumulates in graduate school. Additionally, I categorized these data files into folders according to notebooks for easy access.
Figure Files
During graduate school, I had no trouble locating and identifying my analytical data files. However, I had issues with my figure files, specifically ChemDraw files. Around the time I started graduate school, the spotlight function on Mac computers became significantly more powerful. When I was an undergraduate, I organized files into folders and subfolders, and used a non-specific name to be stored within those highly specific folders. This method worked well until I transitioned to using the search function exclusively, only to discover that I had ten different "Figure001" options that I could only distinguish by opening each one or navigating through my nested file system.
In hindsight, I wish I had named my files based on a project number and included some sort of identifier in the name.
For example:
Let's say I'm working on my first project, creating the initial figure for the NSF GRFP fellowship application. I could name the ChemDraw file as "001_Fig001-NSFGRFP.cdxml" for easy access in the future.
Sometimes, I needed to create similar figures for various purposes (e.g., paper figures, fellowship applications, presentations, or posters). To prevent duplicate files, I have now adopted a more general naming system that reflects the content of the figure. For example, if I create a figure containing my graphical abstract for my first project, I might name it "001_abstract_paper." If it's intended for a poster, I could change it to "001_abstract_poster."
Furthermore, when collaborating on a project and sharing files, it's good practice to include your initials in the file name to avoid confusion!
Folders
I’ve also created a folder system that stems from Tiago Forte’s book and website Building A Second Brain. I highly recommend reading this book if you’re overwhelmed with your “knowledge management” system. This book gives insight into how to use the different folders I refer to below. In brief, I use the PARA folder system to organize data. It’s helped me keep track of where my files should go!
This breaks down to:
Projects - any project that I’m actively working on
Areas of focus - Think medical documents, taxes, pets, etc.
References - Journal articles, templates for chemdraw, group manauals, textbooks etc.
Archive - Stuff that I am not actively working on or don’t have the will to delete!
Not upgrading my laptop before I started graduate school.
tldr: Get the best possible computer that you can afford. It is worth it. If you want to read about my experience not upgrading my laptop then feel free to read below!
Why you should upgrade your laptop:
I went into grad school with a 2014 Macbook Air with a measly 120 GB of storage. Before I started, I couldn’t really conceptualize how much data I would be accruing in graduate school, or the amount of processing power I would need, but lets just say I filled up my hard drive pretty quickly.
My PI chose to provide desktop computers to every graduate student because he wanted to ensure that everyone had a mac computer for ease of file sharing. I did most of my work on my desktop, so it didn’t feel like a huge deal that my laptop wasn’t all that great. I was also lucky that UC Davis offered unlimited google drive storage, so I could keep important (but less frequently accessed) documents in the cloud, and also use the google app backup and sync to select files and folders to store locally and sync to the cloud. This worked super well for me most of the time— I was able to start working on something on my work computer, save it, and by the time I biked home it would be synced and I could work on it from the comfort of my couch.
Unfortunately, this led to a ton of issues when google decided to try to sync all 70 GB of my google drive files to my laptop. I ended up transferring all of my files to a 1 TB hard drive and re-synced my selected folders to my laptop. This somehow managed to happened a total of 3 times throughout my graduate career. That doesn’t sound like much… but I ended up with 3 duplicate backups in various stages of syncing on my external hard drive that I was too scared to delete on the off chance that somehow they contained the only copy of a critical piece of data for my dissertation. I graduated two years ago and I finally finished getting rid of duplicates during my post doc. (If you ever have a problem with duplicates, use the app Gemini, it’s awesome!).
Not taking good enough literature notes
tldr: make sure you have a way to remember why you downloaded certain papers
When I was in grad school, sometimes I would download a paper and several weeks later I would have NO idea why I downloaded it. During my post doc, I read the book How to Take Smart Notes by Sönke Ahrens and it changed the way I approached reading papers and it made literature searching so much more enjoyable and productive! The biggest takeaway that I received from this book is that the best way to take notes is to always rewrite them yourself. Writing a summary in your own words will help you understand the paper much more than if you highlight some text and call it a day. With that being said, I do also use color coded highlighting to supplement my note-taking!
In short, I highly recommend writing notes as you read papers so that you are able to keep track of the main topics of papers and remind yourself why you saved them. I use the reference manager Zotero synced with the Obsidian notes app using a modified version of the workflow described here. I’ll probably go into more detail on this in the future!
Some of the things I did right:
Though I did make a lot of mistakes, I also maintained a highly organized computer and organization workflow. Here are some of the things that I did to maintain that:
Kept a clean desktop- I still only keep important or useful folders on my desktop and that’s it. No random files, folders, or apps!
Gave every document a place to live - I always take the time to organize random documents and downloads into an appropriate location within a week. That leads into my next point…
Cleared my downloads folder regularly - I never let this folder become overwhelmed with junk. I generally cleared it weekly!
Backed up my data - When used correctly, cloud storage is one of the best ways to ensure you don’t lose data! I backed up my computers to a solid state hard drive every week or so (whenever my computer would tell me it had been a while) and used google drive and iCloud to store the rest of my documents. I’m in the apple ecosystem so I was able to access files between my phone, iPad and mac pretty much whenever or wherever I wanted!
TURNED ON AUTOSAVE - I used the Microsoft autosave feature on OneDrive to make sure that I never lost any document that was important to me. I wrote my entire dissertation on OneDrive, and continued to use OneDrive for documents throughout my postdoc.
I could go on and on about productivity and knowledge management, but I think this is enough for now. Hope my failures can ultimately help someone else make fewer mistakes than I did!
Note: I get commissions for purchases made through links in this post. Thanks for reading!