Simmel time space compression12/7/2023 Applied to binary numbers in simple chaotic dynamical systems, the information is zero for many of the 32 bits in Float32 22. The information is analyzed in relation to the variable’s statistics or the statistical dependence on other variables and is often interpreted as the surprise about an outcome. Shannon’s information theory 20, 21 introduced a mathematical concept to quantify information for the outcomes of a random variable. Many bits in Float32 only contain a limited amount of information, as even 16-bit arithmetic has been shown to be sufficient for parts of weather and climate applications 16, 17, 18, 19. Most weather and climate models are based on Float64 arithmetic, which has been questioned, as the transition to 32-bit single-precision floats (Float32) does not necessarily decrease the quality of forecasts 14, 15. Most geophysical and geochemical variables are highly correlated in all of the dimensions, a property that is rarely exploited for climate data compression, although multidimensional compressors are being developed 9, 10, 11, 12.įloating-point numbers are the standard to represent real numbers in binary form 64-bit double-precision floating-point numbers (Float64) consist of a sign bit, 11 exponent bits representing a power of two, and 52 mantissa bits allowing for 16 decimal places of precision across more than 600 orders of magnitude 13. The last dimension results from calculating an ensemble of forecasts to estimate the uncertainty of predictions 7, 8. These data describe physical and chemical variables for the atmosphere, ocean and land in up to six dimensions: three in space, as well as time, forecast lead time and the ensemble dimension. Initiatives towards operational predictions with global storm-resolving simulations, such as Destination Earth 5 or DYAMOND 6, at a grid spacing of a couple of kilometers, will further increase the volume of data. This data production is predicted to quadruple within the next decade due to the increased spatial resolution of the forecast model 2, 3, 4. The European Centre for Medium-Range Weather Forecasts (ECMWF) produces 230 TB of data on a typical day and most of the data are stored on magnetic tapes in its archive. Many supercomputing centers in the world perform operational weather and climate simulations several times per day 1. A data compression Turing test is proposed to optimize compressibility while minimizing information loss for the end use of weather and climate forecast data. Combined with four-dimensional compression, factors beyond 60× are achieved. All CAMS data are 17× compressed relative to 64-bit floats, while preserving 99% of real information. Rounding bits without real information to zero facilitates lossless compression algorithms and encodes the uncertainty within the data itself. Most variables contain fewer than 7 bits of real information per value and are highly compressible due to spatio-temporal correlation. Here we define the bitwise real information content from information theory for the Copernicus Atmospheric Monitoring Service (CAMS). Current techniques do not distinguish the real from the false information in data, leaving the level of meaningful precision unassessed. Compression is essential to reduce storage and to facilitate data sharing. Hundreds of petabytes are produced annually at weather and climate forecast centers worldwide.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |