Approximately 92% (or 99%) of USHCN surface temperature data consists of estimated values | Watts Up With That?

An analysis of the U.S. Historical Climatological Network (USHCN) shows that only about 8%-1% (depending on the stage of processing) of the data survives in the climate record as unaltered/estimated data.

Guest essay by John Goetz

A previous post showed that the adjustment models applied to the GHCN data produce estimated values for  approximately 66% of the information supplied to consumers of the data, such as GISS. Because the US data is a relatively large contributor to the volume of GHCN data, this post looks at the effects of adjustment models on the USHCN data. The charts in this post use the data set downloaded at approximately 2:00 PM on 9/25/2015 from the USHCN FTP Site.

According to the USHCN V2.5 readme file: “USHCN version 2.5 is now produced using the same processing system used for GHCN-Monthly version 3. This reprocessing consists of a construction process that assembles the USHCN version 2.5 monthly data in a specific source priority order (one that favors monthly data calculated directly from the latest version of GHCN-Daily), quality controls the data, identifies inhomogeneities and performs adjustments where possible.”

There are three important differences with the GHCN process. First, the USHCN process produces unique output that shows the time-of-observation (TOBs) estimate for each station. Second, USHCN will attempt to estimate values for missing data, a process referred to as infilling. Infilled data, however, is not used by GHCN. The third difference is that the homogenized data for the US stations produced by USHCN differs from the adjusted data for the same US stations produced by GHCN. My conjecture is that this is because the homogenization models for GHCN bring in data across national boundaries whereas those for USHCN do not. This requires further investigation.

Contribution of USHCN to GHCN

In the comments section of the previously referenced post, Tim Ball pointed out that USHCN contributes a disproportionate amount of data to the GHCN data set. The first chart below shows this contribution over time. Note that the US land area (including Alaska and Hawaii) is 6.62% of the total land area on Earth.

Percentage of Reporting GHCN Stations that are USHCN

How Much of the Data is Modeled?

The following chart shows the amount of data that is available in the USHCN record for every month from January, 1880 to the present. The y-axis is the number of stations reporting data, so any point on the blue curve represents the number of measurements reported in the given month. In the chart, the red curve represents the number of months in which the monthly average was calculated from incomplete daily temperature records. USHCN will calculate a monthly average with up to nine days missing from the daily record, and flags the month with a lower-case letter, from “a” (1 day missing) to “i” (nine days missing). As can be seen from the curve, approximately 25% of the monthly values were calculated with some daily values missing. The apparently seasonal behavior of the red curve warrants further investigation.

Reporting USHCN Stations

The third chart shows the extent that the adjustment models affect the USHCN data. The blue curve again shows the amount of data that is available in the USHCN record for every month. The purple curve shows the number of measurements each month that are estimated due to TOBs. Approximately 91% of the USHCN has a TOBs estimate. The green curve shows the number of measurements each month that are estimated due to homogenization. This amounts to approximately 99% of the record. As mentioned earlier, the GHCN and USHCN estimates for US data differ. In the case of GHCN, approximately 92% of the US record is estimated.

The red curve is the amount of data that is discarded by a combination of homogenization and GHCN. Occasionally homogenization discards the original data outright and replaces it with an invalid temperature (-9999). More often it discards the data and replaces it with a value computed from surrounding stations. When that happens, the homogenized data is flagged with an “E”. GHCN does not use values flagged in this manner, which is why they are included in the red curve as discarded.

Reporting USHCN Stations and Extent of Estimates

The next chart shows the three sets of data (TOBs, homogenized, discarded) as a percentage of total data reported.

Extent of USHCN Estimates as a Percentage of Reporting Stations

The Effect of the Models

The fifth chart shows the average change to the raw value due to the TOBs adjustment model replacing it with an estimated value. The curve includes all estimates, including the 9% of cases where the TOBs value is equal to the raw data value.

Change to Raw USHCN Value after TOB Estimate

The sixth chart shows the average change to the raw value due to the homogenization model. The curve includes all estimates, including the 1% of cases where the homogenized value is equal to the raw data value.

Change to Raw USHCN Value after Homogenization Estimate

Incomplete Months

As described earlier, USHCN will calculate a monthly average if up to nine days worth of data are missing. The following chart shows the percentage of months in the record that are incomplete (red curve) and the percentage of months that are retained after the adjustment models are applied (black curve). It is apparent that incomplete months are not often discarded.

Number of USHCN Monthly Averages Calculated with Incomplete Daily Records

The next chart shows the average number of days that were missing when the month’s daily record was incomplete. After some volatility prior to 1900, the average incomplete month is missing approximately two days of data (6.5%).

Average Number of Days Missing from Incomplete USHCN Monthly Averages

A Word on Infilling

The USHCN models will produce estimates for some months that are missing, and occasionally replace a month entirely with an estimate if there are too many inhomogeneities. The last chart shows the frequency this occurred in the USHCN record. The blue curve shows the number of non-existent measurements that are estimated by the infilling process. The purple line shows the number of existing measurements that are discarded and replaced by the infilling process. Prior to 1920, the estimation of missing data was a frequent occurrence. Since then, the replacement of existing data has occurred more frequently than estimation of missing data.

Infilled data is not present in the GHCN adjustment estimates.

Amount of USHCN Infilling of Missing Data


The US accounts for 6.62% of the land area on Earth, but accounts for 39% of the data in the GHCN network. Overall, from 1880 to the present, approximately 99% of the temperature data in the USHCN homogenized output has been estimated (differs from the original raw data). Approximately 92% of the temperature data in the USHCN TOB output has been estimated. The GHCN adjustment models estimate approximately 92% of the US temperatures, but those estimates do not match either the USHCN TOB or homogenized estimates.

The homogenization estimate introduces a positive temperature trend of approximately 0.34 C per century relative to the USHCN raw data. The TOBs estimate introduces a positive temperature trend of approximately 0.16 C per century. These are not additive. The homogenization trend already accounts for the TOBs trend.

Note: A couple of minutes after publication, the subtitle was edited to be more accurate, reflecting a range of percentages in the data.

It should also be noted, that the U.S. Climate Reference Network, designed from the start to be free of the need for ANY adjustment of data, does not show any trend, as I highlighted in June 2015 in this article:  Despite attempts to erase it globally, “the pause” still exists in pristine US surface temperature data

Here is the data plotted from that network:

Of course Tom Karl and Tom Peterson of NOAA/NCDC (now NCEI) never let this USCRN data see the light of day in a public press release or a State of the Climate report for media consumption, it is relegated to a backroom of their website mission and never mentioned. When it comes to claims about hottest year/month/day ever, instead, the highly adjusted, highly uncertain USHCN/GHCN data is what the public sees in these regular communications.

One wonders why NOAA NCDC/NCEI spent millions of dollars to create a state of the art climate network for the United States, and then never uses it to inform the public. Perhaps it might be because it doesn’t give the result they want? – Anthony Watts

Like this: