Primary data collection and management

This chapter considers data management tools and best practices for using primary data to estimate the impact reduction potential of interventions and track progress in the supply chain.

To refine their baselines and track changes related to specific interventions, many companies turn to primary data, which more accurately describes the practices used in their own supply chains and operations.

Companies have access to a wide range of tools to leverage primary data to generate emission factors representative of their supply chains. However, without high quality representative data, good data management practices and careful analysis of results, tools will do little to help companies in achieving their ultimate goal: reducing impacts.

To ensure you’re setting yourself up for success from the get-go, we’ve outlined the seven essential steps companies should take after identifying the type of data they need to properly collect, process and utilize primary data.

7 essential steps for collecting + processing primary data

Choose an appropriate tool.
Identify a statistically representative sample.
Provide proper training to the data collection team.
Collect the data.
Review your data.
Process the data to measure environmental impact.
Carefully interpret the results.

Note: Tools can help companies leverage primary data to generate emissions factors that are more representative of their supply chains, but if the quality of the data going into the tool is poor, so too will be the results it produces. This can lead to less-than-ideal outcomes for business’s sustainability ambitions because this is the information companies use to shape and adapt their sustainability strategies, roadmaps and goals. To ensure they’re on the right path — and that the significant time, budget and staffing resources that went into collecting primary data were well spent — it’s critical that the data collection process be well managed.

1. Choose an appropriate tool

There are many tools out there that companies can use to leverage primary data. To ensure you choose the most appropriate one for your needs now and in the future, there are a few important things to consider, such as its capabilities, scope and more.

Capabilities

Tools can be classified into two main types: 1) ones that only collect primary data and 2) ones that support primary data collection and environmental impact assessment.

Type 1: Primary data collection only tools

This type of tool is used strictly to collect data. Typically in the form of a survey or questionnaire, data collection only tools exist in a variety of formats that range from the simple (e.g., paper forms or Excel sheets) to the sophisticated (e.g., computer applications or web-based platforms). In some cases, primary data collection can be aided by satellite imagery or remote sensing tools, which allow companies to gather data at scale without having to be on the ground.

Type 2: Primary data collection and environmental impact assessment tools

With these types of tools, data collection is supported by a survey or questionnaire which is then integrated into the tool’s embedded model to generate results. Results are often displayed with a digital service, in an application or an online dashboard.

If you’ve already done some primary data collection using one tool and decide to switch to another, it may be possible to incorporate the data you’ve already collected.

Scope, relevance + limitations

When choosing a tool, it’s important to understand its scope. Does it cover the necessary information to generate relevant results/primary emission factors ? Are there any gaps or limitations? If there are, there is a risk that you will not be able to generate results robust enough to build an accurate baseline.

We recommend using a tool that provides comprehensive results across multiple environmental impact indicators, including GHG emissions, water withdrawal, land occupation, etc. A holistic overview of your environmental impacts can support decision making and help you prioritize actions, while also preventing the unintentional transfer of impacts or trade-offs that can occur when companies focus solely on one outcome, which is most often climate change impacts.

When selecting a tool, it’s not enough to consider your current needs, you also need to anticipate your needs in the future. Some things to consider:

Can the tool accommodate methodology updates?
Will your primary data be saved or stored somewhere so that it can be reused? If you decide to switch tools, will you be able to transfer your primary data?
Does the tool maintain a back-log or documentation of updates so that users can separate “real changes” in impact reduction from tool or methodological changes?
Are there any intellectual property or commercial issues that might impede future work?

Transparency + up-to-date methodologies

Transparency is a hallmark of credible assessment. It ensures scientific and strategic integrity. When choosing a tool, it’s important to fully understand if the tool is transparent and up-to-date with the latest science and methodologies. Understanding the data, calculations and assumptions behind a tool is critical for interpreting results, finding and fixing errors, and ensuring updates can be made. Tools developed and used internally tend to be fully transparent. Companies relying on external tools, however, need to ensure they have access to documentation on the tool’s methodology.

For tools that combine primary data collection and environmental impact assessment, users need to be able to interpret the results and know if they can be compared to, combined with, or generally aligned with results obtained using other tools.

Key things to look out for are:

General modeling approaches (e.g., which version of global warming potential (GWP) values are used, are these the most up-to-date IPCC GWP 100 values?)

Key assumptions and default values (e.g., is organic matter assumed to have a fixed dry matter content, is this relevant for all cases?)

Allocation (e.g., is economic or mass-based allocation applied and is this aligned with the broader carbon accounting approach for the baseline?)

System boundaries (e.g., how is waste, compost, and recycled content modeled? is it using a cut-off approach or not?)
Removals (e.g., is carbon sequestration clearly differentiated from emissions or simply subtracted, making it difficult to interpret?)

2. Identify a statistically representative sample

Time, budget and personnel are often limiting factors when it comes to collecting primary data. Often, it’s not feasible to collect information from the entire population (e.g., every farm or processing facility), so it’s important to define a sample that statistically represents the relevant population. Done right, the results you generate will be robust, so that your baseline accurately reflects reality and your hotspots are correctly identified, ensuring that your roadmap/action plan will lead to effective environmental impact reductions. Done poorly (e.g., choosing a sample composed only of top performers), the results won’t be representative and may be subject to greenwashing accusations.

In some cases, technology can eliminate the need for sampling because data can be gathered directly at scale. For example, satellite imagery and remote sensing has allowed for rapid and comprehensive data collection that may capture the full coverage of a population.

It’s important to note that the influence of variability and disruptions (e.g., changes in yield due to extreme climate events) may influence or skew the representativeness of the results. To resolve this issue, we recommend averaging data collection across multiple years.

3. Provide proper training to the data collection team

Before deploying a tool, it’s critical to provide training to the people who will use it (e.g., local partners, producers or suppliers) on key terminology and how to correctly input information, including units, time span, and any relevant assumptions. For example, when collecting information on fertilizer application, special attention should be given to: regional preference for imperial or metric units; application rates representing either an annual average or given crop cycle that may be shorter or longer than a year; and the assumption behind fertilizer composition (e.g., NPK ratio). Though these parameters may seem simple and straightforward, they are a common source of error in primary data collection that can have significant implications for the overall quality of data and the results they produce.

Data collection can be time-intensive and needs to be done repeatedly (e.g., every 1-5 years) in order for companies to track progress effectively, so it’s a big ask for already busy producers, suppliers and farmers. Providing insight into the purpose of the data collection, as well as incentives — financial or otherwise, can bolster buy-in and engagement, helping strengthen relationships between companies and their supply chain stakeholders, ensuring more accurate and robust data collection.

4. Collect the data

Data collection should only begin after a representative sample has been identified and the data collection team has been trained to ensure a successful approach.

Ensure enough time is allotted for this step and that there is ongoing support for the data collection team. Keep track of best practices and key learnings developed during the data collection process to shape future collection efforts.

5. Review your data

Once the data has been collected, it needs to undergo a thorough review to ensure consistency and address any errors. All mandatory information should be filled in and correct (e.g., all numbers and units are plausible). We recommend doing a cross-check or benchmark between the different survey answers and external sources (e.g., FAO data or other databases) to see if answers make sense. For example, if one farm harvests 200 kilograms of a given crop and another farm of the same size and located in the same region harvests 2,000 kilograms of the same crop, this could be real data reflecting differences in yield across farms, or it could be an order of magnitude data error for one of the farms.

Processing the data into the model should only happen once the collected primary data has been deemed to be of sufficient quality. If data quality is not ensured, results will be biased, which will negatively impact all of the work that follows.

6. Process the data to measure environmental impact

Using a tool that enables data collection as well as data modeling (i.e., tool-type 2) can be a major advantage when it comes to processing your data, both in terms of compatibility and efficient generation of results/emission factors. Such tools ensure that all the data needed for modeling are collected and prevent the collection of unnecessary or inadequate data, as well as the need for additional data collection after the fact.

However, it is possible to input data collected separately into a modeling tool (e.g., through an application programming interface).

7. Carefully interpret the results

Once you have your results, you’ll need to interpret them.

Interpretation is the process of both challenging results and learning from them — understanding what they mean and summarizing the most material takeaways.

Key interpretation questions to explore include:

Are the results different than expected when comparing to others or benchmarking?

Will using a different tool provide different results?

What are the major drivers of and contributors to the results?

Does the observed directionality suggest how to act to drive change?

Are there key assumptions or uncertainties behind the results, especially its main drivers, that may alter the outcome? Do any assumptions need to be revisited?

Beyond the environmental impacts and insights provided by the tool, are there other major interpretations needed (e.g., feasibility of intended intervention scaling)?

What are the limitations of the results and associated interpretations?

A major challenge companies face when interpreting results is comparing them when they have been generated using different tools. Results may vary between tools because of differences in scope (e.g., the inclusion/exclusion of carbon sequestration), assumptions, models and default values used.

It’s therefore important to distinguish the difference between changes in results that come from switching from secondary to primary data (e.g., increased traceability) or from one tool to another, and changes in the results that indicate effective change (e.g., due to specific interventions). When switching from secondary data to primary data, or switching from one primary data collection tool to another, you’ll need to ensure that the reductions you are measuring and reporting are “real” and not just artifacts of calculations. Keep in mind that solely switching primary data collection tools or switching from a secondary emissions factor to a primary one doesn’t count as a reduction. It should be noted that switching from one tool to another or from secondary data to primary data will possibly require rebaselining.

For the sake of transparency and setting stakeholder expectations, it’s critical to communicate why results may vary and that the appropriate processes will be followed to ensure actual progress is reported.

Data collection: a means to an end

Collecting data is a means to an end, with the ultimate objective being to reduce impacts. Thus, before, during and after primary data collection, it‘s important to keep the purpose — your “why” — of data collection top of mind: reducing your environmental impacts. In this context, primary data is useful to better understand the reality on the ground, prioritize, forecast and finally track progress.

Figure 8. Iterative process for the monitoring and implementation of impact reduction measures.

Focusing solely on refining data, without a true purpose in mind, you risk expending valuable resources (cost, time, personnel) without a clear outcome and detracting attention away from the actions necessary to drive business transformation and alignment with planetary boundaries.

Have questions?

We have answers. Get in touch with our team today and let us guide you through the solutions that might help you on your journey toward a sustainable supply chain.

Keep reading

Recalculating base-year emissions

This chapter covers what is considered real progress toward sustainability as you calculate and track changes in your supply chain.

Read

The challenge of double counting

This chapter focuses on how to manage the double counting of emissions reductions when considering real progress toward a sustainable supply chain.

Read

Understanding supply sheds + value chain interventions

This chapter covers why understanding the concept of supply sheds can help make more active progress and ensure companies will be able to claim progress.

Read

The right type of data for you

Progress toward a sustainable supply chain hinges on the type of data one is using and why. Here we’ll differentiate the types of data and guide you to choose what’s right for your goals.

Read

05 – Currently reading