Mise en Place for Data Science

Curt_Bergmann_600x448.jpg

Click to learn more about author
Curt Bergmann.

When guests arrive at a great restaurant, the chef and
all the cooks have already planned and assembled everything they need to
quickly deliver excellence on a plate. Their process, called mise en place, is used by chefs all over
the world. Emerging after the introduction of a system of cooking by French
chef Georges Escoffier in the early 20th century, it reliably produces a
repeatable process that rewards efficiency and excellence. Adapting ideas from Working Clean by Dan Charnas [1] ,
we too can improve our chances of delivering excellence by planning our steps,
assembling our resources, keeping our project folders clean, and documenting
our process before finishing for the day — all while checking for errors at
every opportunity.

Planning

When we start any Data Science project, we have a plan in mind. How many of us take the time to write it down? Students at the Culinary Institute of America learn this from the very beginning (Charnas, 263). Not only do they write down their steps, but they sequence them in an efficient order. In the fast pace of service, they can then rely on their plan to get them through the day efficiently and with fewer mistakes. It’s time for us to do the same. For example, when we start an exploratory data analysis (EDA), we, too, need a plan to identify the data of interest, understand what each record describes, summarize columns, and other steps. Our first step is to request this access and, while awaiting permissions and access methods, prepare for analysis. You might argue that it’s trivial to remember to make this request, but if you’re asked on a busy Monday to start this work on Wednesday, you might think you have two days to start. By then, you’ll be two days late. It’s better to take time and plan early so that your permissions and access are ready before it’s time for you to analyze the data.

Identifying Resources

When a chef prepares a menu, they can rely on the type
of restaurant and their tastes to help them narrow the list. Consider
identifying a list of code snippets that can be added into your projects as you
need them — just like adding ingredients to a recipe you’re building. Of
course, any good chef adjusts the recipe depending on ingredients in season. We
can do the same, knowing that some data and projects have different needs.

Working Clean

At the end of the day, the chef puts all of their
reusable ingredients in containers, labels them, and puts them away for
tomorrow, throwing away unusable leftovers and putting their dirty dishes in
the dishwasher (Charnas, 11). We also have unusable leftovers or dirty dishes
such as abandoned scripts and temporary output. By deleting them, we won’t be
confused tomorrow or next year when we reopen the project folder. We need to
label everything we keep by checking in our code to git and pushing to GitHub.
Also, as scientists, we need to go further than the cook and document
everything we did. If we don’t document it, it didn’t happen. Now, when the
client asks questions about our analysis, we have notes to help us answer their
questions or show them unreported intermediate results to support why we made
the decisions we did during the analysis. But don’t wait until the end of the
day. If you work clean as you go and document in near real-time, you will have
less to do at the end of the day.

Working clean allows us to concentrate on the good stuff —
our R scripts, reports, or PowerPoint slides. Just like a chef tasting their
sauces and other ingredients before assembling them onto the plate, we need to
check our data at every step of the way, starting with the very first load of
the data. Did we check to make sure it loaded all of the records? Are any
records duplicated? If so, is that alright? Is there any data missing in the
middle of a sequence of date-dependent records? We can fix problems in a less
costly manner and in a shorter time by checking at every step of the way.

Assuring Quality

Finally, before we turn over our analysis, we should
take one last look at what we are delivering. Just as the chef ensures that the
completed meal is delicious and pleasing to the eye, we need to ensure that the
insights are explained clearly with text, tables, and figures. Of course, that
is only step one in the final check. Peers (other data scientists) and
colleagues (other team members) should also check our work to determine quality
and applicability to the audience. Just like some guests don’t like spicy
foods, some clients won’t understand a histogram. It may take adjustments to make
our final results best suited for this client.

Producing Excellence

Once we’ve delivered the final result, it’s time once
again to work clean, document, put away our project, and start the next one. We
can feel confident that we’ve efficiently performed our analysis because we
planned every step of the way, we had all of our resources ready to go when we
needed them, and our project is reproducible because we got rid of perishables,
checked in all our code, and documented the project. Our client received a
deliverable, keeping efficiency and quality in mind every step of the way.
Let’s take advantage of the hundred years developing mise en place and put it
to use, helping us to deliver excellence.

A Mise en Place Inspired Checklist

If applied to every Data Science project, we can feel
confident in delivering results with efficiency and quality included in every
step.

1. Does your project have a plan?

2. Does your plan include a list of resources?

3. Is all of your code checked-in?

4. Do you have built-in quality checks?

5. Do you have a project journal?

6. Is your work reproducible?

References

[1] Charnas D. Everything in Its Place, The Power of Mise-En-Place to
Organize Your Life, Work, and Mind. Rodale; 2017.

Credit: Source link