Facts and insights into the environmental impact of data storage, plus practical techniques to minimise it.
Data storage looks invisible, but it has a real physical footprint: electricity for servers and drives, cooling and power delivery overheads, networks that move data around, and the embodied impacts of manufacturing and replacing hardware.
Storage is often discussed as part of “data centres” or “the cloud,” but it deserves its own lens because storage decisions quietly drive long-term energy use. The data you keep today is the data you will power, cool, protect, back up, replicate, and eventually migrate tomorrow.
This guide explains what actually drives emissions in storage, how to measure what matters, and how to reduce impact without sacrificing resilience, security, or compliance.
Start with what is measurable
Before optimising anything, define what you are counting. For most organisations, storage-related emissions show up in four places:
- Electricity use in facilities (servers, storage arrays, networking, cooling, power conversion).
- Electricity used by your cloud or colocation providers (still “your” footprint in most climate accounting).
- Data transmission (moving data across networks and regions consumes energy, especially at scale).
- Embodied impacts (manufacturing hardware, replacing it, and handling end-of-life).
At a global level, the International Energy Agency estimates data centres used roughly 240–340 TWh of electricity in 2022 (around 1–1.3% of global final electricity demand). That number is significant on its own, and it matters more when growth is concentrated in grids that are already under strain or still fossil-heavy. Source: IEA
For carbon accounting, the most widely used framework is the Greenhouse Gas Protocol. For purchased electricity (Scope 2), it is common to report emissions two ways: location-based (average grid intensity where electricity is used) and market-based (reflecting contractual instruments such as renewable energy purchases). Source: GHG Protocol Scope 2 Guidance (PDF)
Why does this matter? Because “we buy renewables” and “our workloads run in a low-carbon grid” are not the same claim. Both can be legitimate, but they should be distinguished so teams can prioritise the actions that most reliably reduce real-world emissions.
Build a storage baseline that is actually useful
A practical storage baseline does not need to be perfect. It needs to be decision-relevant. Start with:
- Total stored capacity (logical and physical, if you can measure both).
- Growth rate (monthly or quarterly).
- Data temperature (hot/warm/cold, or frequently accessed vs rarely accessed).
- Replication and backup footprint (how many copies exist and where).
- Facility efficiency (PUE or equivalent metrics for on-prem; provider metrics for cloud/colo).
- Electricity carbon intensity for each region where storage resides.
If you already track organisational emissions, treat storage like a “subsystem” with its own KPIs: capacity growth, percent cold data, percent duplicated, and percent stored in low-carbon regions or low-carbon providers.
For a broader primer on household and organisational footprint thinking (and why measurement can drive action), see: carbon footprint basics.
Choose low-carbon, high-efficiency facilities
Where your storage runs can matter as much as what hardware you buy. Facility choices influence:
- Cooling demand (climate and design determine how much energy is spent moving heat away).
- Power conversion losses (UPS and power distribution efficiency).
- Grid carbon intensity (a clean grid can reduce emissions even if electricity use is unchanged).
- Water impacts (some cooling approaches are water-intensive; impacts depend on local scarcity).
PUE (power usage effectiveness) is a common proxy for facility efficiency: total facility energy divided by IT equipment energy. Industry averages have improved over time, but progress has slowed. Uptime Institute has reported that the industry-wide average PUE has been broadly flat since around 2020, with 2023 around 1.58. Source: Uptime Institute analysis
Two implications follow:
- There are still big gains available by moving workloads from legacy facilities into efficient sites (or compelling providers to modernise).
- Efficiency alone is not enough; the carbon intensity of electricity is a parallel lever.
What “low-carbon hosting” looks like in practice
A credible low-carbon strategy usually combines:
- Efficient facilities (better cooling, power delivery, and operational discipline).
- Cleaner electricity (low-carbon grids, credible renewable procurement, and transparent reporting).
- Smarter data architecture (less stored, fewer copies, less movement).
If you are assessing providers, look for clear disclosure on energy sourcing and facility efficiency, and prefer providers that publish methodology and progress data. For macro context on data centres and energy demand, the IEA maintains an overview that is regularly updated. IEA: Data centres and data transmission networks
Reduce what you store before you optimise how you store it
The single highest-leverage storage intervention is often unglamorous: store less. Every extra dataset tends to produce more copies, more backups, more migrations, and more “just in case” replicas over time.
Data minimisation has three parts:
- Don’t collect what you do not need.
- Don’t keep what you cannot justify.
- Don’t duplicate by default (unless resilience or policy requires it).
Make retention and deletion policies real
Many organisations have retention policies that exist on paper but not in practice. A realistic policy includes:
- Data classification (business value, sensitivity, legal retention requirements).
- Lifecycle rules (hot to warm to cold tiers; archive; delete).
- Ownership (who can approve a dataset’s continued storage and why).
- Automation (policy enforcement at the storage layer wherever possible).
- Exceptions that are time-bounded, documented, and reviewed.
Deletion can be politically difficult because “data might be useful someday.” A workaround is to require explicit re-justification: if a dataset has no owner willing to defend its value, it becomes a candidate for archival or deletion.
There is also an equity angle: wasteful “store everything forever” practices shift energy burdens onto grids and communities that may not benefit from the data’s value. This is where circular economy thinking becomes relevant: optimise for longer lifetimes and less throughput of materials and energy. Circular economy basics
Cut physical storage demand with capacity optimisation
Once you control unnecessary growth, you can reduce the physical footprint of what remains. Several techniques reduce the amount of hardware needed for the same logical data:
- Deduplication (store unique blocks once, reference duplicates).
- Compression (reduce the size of data at rest, with compute trade-offs).
- Thin provisioning (allocate storage logically without reserving physical capacity until needed).
- Tiering (place data on storage that matches its access patterns and performance needs).
- Snapshot discipline (snapshots can quietly multiply retained data).
Capacity optimisation can reduce energy use by reducing the amount of spinning disk or flash capacity required for a given logical dataset (and therefore reducing the supporting hardware footprint). The Storage Networking Industry Association discusses how capacity optimisation methods can contribute to energy efficiency outcomes. Source: SNIA energy efficiency guidance (PDF)
Be honest about the trade-offs
Not every optimisation is universally “green.” Some techniques shift energy from storage hardware to compute (for example, compression). The right approach depends on:
- Workload type (structured vs unstructured, write-heavy vs read-heavy).
- Access patterns (frequent access vs archival).
- Latency and resilience requirements.
- Whether your compute runs in cleaner electricity regions than your storage.
The goal is not to chase a single metric. It is to reduce total impact while maintaining the service users actually need.
Reduce emissions from data movement
Storage is not only about keeping data; it is also about moving it: ingestion, replication, backup, analytics, and training pipelines. Moving high volumes of data across networks and regions can add energy demand and often increases storage duplication along the way.
Practical ways to reduce movement without harming reliability include:
- Keep processing close to the data where possible (compute-to-data rather than data-to-compute).
- Reduce cross-region replication to what is necessary for resilience and legal requirements.
- Use incremental backups and verify that “full backups” are not happening unnecessarily.
- Consolidate pipelines that create multiple derived datasets doing the same job.
- Prefer “cold path” architectures for rarely accessed data (less frequent scanning, indexing, and migration).
Network and internet infrastructure also have impacts, and those impacts are often hidden from consumer and business narratives. For background on uneven access and infrastructure burdens globally, see: internet access around the world.
Put cold data on cold storage
A large share of organisational data is “cold”: rarely accessed but retained for compliance, safety, or institutional memory. Treating cold data like hot data is expensive and wasteful.
Cold-data strategies usually include:
- Low-power storage tiers designed for infrequent access.
- Long retention media where appropriate.
- Clear retrieval expectations (minutes vs hours vs days), so teams do not over-provision “just in case” performance.
Tape can be a rational choice for some archives
Tape storage is unfashionable but still relevant for long-term archives. In the right context, it can reduce energy use because tape libraries can keep media offline when not in use, while still supporting large-scale retention needs.
Tape is not a universal solution. It works best when:
- Data is truly cold and retrieval time is not critical.
- Retention periods are long (years).
- Organisations can manage media and migration planning responsibly.
The point is not “tape vs cloud.” The point is matching data temperature to the least-impactful storage that still meets the need.
Extend hardware life and design for circularity
When storage discussions focus only on operational energy, they miss a major lever: hardware turnover. Manufacturing, transporting, and disposing of equipment carry embodied impacts that can be significant, especially when replacement cycles are short.
Practical steps to reduce lifecycle impacts include:
- Procure for durability and repairability, not just peak performance.
- Right-size performance so hardware is not overbuilt for the job.
- Extend refresh cycles where reliability allows (supported by monitoring and proactive maintenance).
- Use refurbishment and secondary markets for suitable use cases.
- Plan end-of-life responsibly (reuse first, then high-quality recycling with traceability).
In circular-economy terms, the cleanest device is often the one you do not replace. This is also where governance matters: if teams get rewarded for “new” rather than “sufficient,” footprints rise quietly.
Cooling and power management that actually works
Cooling can be a major share of facility energy use, and the most effective strategies tend to be operationally disciplined rather than flashy:
- Hot/cold aisle containment (prevent mixing of hot exhaust and cold supply air).
- Free cooling where climate allows (use outside air or water loops rather than mechanical chillers).
- Setpoints and airflow tuning (avoid overcooling “just in case”).
- Continuous monitoring (temperature, humidity, power, airflow, and anomalies).
Because average PUE improvements have slowed, many gains now come from careful engineering and operational discipline, plus broader choices about where workloads run. For a snapshot of industry-wide trends and the broader context around efficiency plateaus, see Uptime Institute’s research resources. Source: Uptime Institute survey summary
AI, training data, and “storage sprawl”
AI systems can intensify storage demand in two ways:
- More data retained for training, fine-tuning, and evaluation.
- More copies created across pipelines (raw, cleaned, labelled, augmented, feature stores, checkpoints).
Even when compute gets most of the attention, storage decisions shape how large AI-related datasets become and how long they persist. If teams are not careful, “keep everything” becomes a default that multiplies impact and risk.
The IEA has noted that AI is a growing driver of data centre electricity demand, while also emphasising that the broader picture depends on assumptions about efficiency improvements, deployment patterns, and policy responses. Source: IEA, Energy and AI
A practical approach is to treat training data as a governed asset:
- Define datasets with owners and retention horizons.
- Minimise duplication across teams via shared, well-documented sources of truth.
- Archive responsibly (cold tiers or offline media if retention is necessary).
- Delete derived artifacts that are not needed after evaluation or deployment.
Pick providers and vendors with verifiable sustainability practices
If you use cloud or managed storage, your impact is strongly influenced by provider choices. Marketing claims vary; the best signals tend to be concrete:
- Transparent reporting (including methods and boundaries).
- Clear electricity strategy (grid location, procurement, and claims backed by documentation).
- Efficiency measures (facility performance, equipment lifecycle practices).
- Asset lifecycle programs (repair, reuse, and responsible end-of-life).
Where possible, prefer providers that publish the details behind their claims, rather than vague “green cloud” language. When in doubt, ask direct questions: Where is the data stored? What is the regional grid intensity? What is the facility’s efficiency? What happens to retired equipment?
Remote work has increased reliance on cloud tooling for many organisations; the sustainability story of “cloud vs on-prem” depends heavily on the specifics. Remote work and the planet
Practical checklist: the highest-impact moves
- Measure: establish a storage baseline (capacity, growth, copies, regions, efficiency).
- Minimise: stop collecting and retaining data without a clear purpose or owner.
- Automate: lifecycle policies that tier, archive, and delete according to classification.
- Optimise: use dedupe, compression, thin provisioning, and snapshot discipline where appropriate.
- Right-place: host storage in efficient facilities and lower-carbon grids where feasible.
- Right-tier: put cold data on genuinely cold storage; consider offline/archive approaches where suitable.
- Extend life: reduce refresh churn and plan hardware reuse and responsible end-of-life.
- Reduce movement: avoid unnecessary transfers and cross-region duplication.
The bottom line
The environmental impact of data storage is not a mystery problem. It is a systems problem: electricity, facilities, hardware lifecycle, and organisational habits that quietly inflate what gets stored and for how long.
The most reliable reductions start upstream: keep only what you need, govern datasets like real assets, and match data temperature to the lowest-impact storage that still meets resilience and compliance requirements. Then optimise the remainder with capacity reduction techniques, efficient facilities, and credible electricity strategies.
Reducing storage footprint is not only an environmental action. Done well, it also improves security, reduces costs, and forces clarity about what information truly matters.