Norway’s 2 petabytes of Huawei flash storage and LLM training

Norway’s National Library is building a sovereign LLM in Norwegian because no commercial provider is making a local-language model.
The Ministry of Culture assigned the library the task because it holds Norway’s largest digital collection of books, newspapers, web pages, and other cultural heritage materials.
A copyright agreement with Norwegian newspapers allows the library to train on copyrighted content, which private companies do not have.
The library has digitized materials since 2005 and now stores 20 PB of unique data in a 3-2-1 preservation setup, about 60 PB total.
Its AI pipeline uses 2 PB of Huawei OceanStor Dorado flash storage for low-latency data ingestion and preprocessing.
The flash layer is used for cleaning, deduplication, normalization, validation, and preparing data before training.
After preprocessing, the data is sent to Norway’s national supercomputer, Sigma2 Olivia, for LLM training runs.
Husnes said the main bottlenecks are data quality and pipeline throughput, not compute.
He also cited unresolved challenges in evaluation, governance, and orchestration across preservation, on-prem AI, and supercomputing systems.

Your notes

Save this item to your library to add private notes.