There are many types of things that could go into a README and there are multiple audiences to consider. An important thing to note is that your README file is what is going to communicate what’s going on in your project, and as such should grow with your project. In this blog post, I’ll discuss what to pull from each part of the OSEMN data science method to show off in your README. Which of course means that your README should be updated as your project progresses. If you have deployed your model, be sure to state that in an introductory paragraph, along with including a link.

Obtain

For this part of the README it’s important to communicate your data sources. Maybe someone else wants to use the same source data or has a question about how to work with your sourced data in the first place. For a technical audience, this may be important. For a non-technical audience, communicating that your data was sourced ethically and potentially creatively will assist with an increase in buy in to what you have to say. Essentially, including this aspect of your README will demonstrate skill in one of the biggest barriers to starting a data science project.

For a visual component, including a sample of the obtained dataset, by looking at the head, or in the case of working with image data, including one of the raw images, wouldn’t be amiss. This will also allow the reader to start seeing how you as data scientist think. Or at least the next visual will. Showing the progress made during a project and the transformations that take place show competency and potentially show of your unique skill set.

Scrub

Here, it will be important to explain the unique steps that were taken to prepare the data for analysis. In this section it will be important to discuss the specific methods taken for the technical audience. Again, this documents your process and allows others to see how you worked with particularly tricky sections. For the nontechnical audience, it might be important to discuss how you know that your data is still representative of the original input data. This discussion will showcase how you might overcome barriers to data analysis.

Explore

In this section, your project should truly begin to shine. Anything that stands out about any features that you may have added should be included. This again shows off your creativity and your ability to problem solve. This is where you can start to stand out against the crowd of other data scientists. Plus, this exploration section is visual heaven. The visuals that can be created are only limited by your knowledge and proficiency. Make sure to include explanations of difficult to interpret graphical relationships for your non-technical audiences.

Be sure to include several visuals. This section is likely what will draw both audiences to look deeper into your project and what you’ve accomplished therein.

Model

Be sure to restate the purpose of the project to connect with your non-technical audience, along with a success or failure statement. Then, in this section, I would include all of the details that you need to talk about this project in a technical interview. How many models were evaluated and the specific criteria of the best model, along with explanations of selected parameters. This is the meat of your project, and should reflect that in the length of your README file.

For the visual of this section, including a loss or accuracy, or other training criteria graph would be an asset to allow technical and possibly non-technical audiences to get a quick look at your model performance. This quick glance will again draw readers into actually reading about your project to see why your performance is so high, potentially higher than their own. Additionally, communicating the steps you took to get there only builds up knowledge within the data science community.

Interpret

This section could a myriad of different things. If your model has been deployed, this can be a great place to talk about the process that was needed to make that happen. Additionally, be sure to restate your business case and explain how your model directly addresses it. Including a general conclusion along with any future work planned for the project might be a good idea. In fact, this might be where your README starts, with that business case and the planned next steps.

In summary, your README should grow along with your project and should be updated with regularity. Make sure that the writing and concepts within accessible to audiences at all levels of proficiency with business and code. That will likely include explaining your logic for each decision made along the way. This logic story can be a great way to prepare for technical interviews. This is what I’ll strive to do in every README that I generate or contribute to anyways.