Time Series Toolbox (TST)

User Guide


Welcome to the Time Series Toolbox


The Time Series Toolbox enables users to perform preliminary analysis on either user uploaded time series data or preloaded United States Geological Survey (USGS) streamflow gage data. Without programming expertise, users can deploy streamlined analysis pipelines, uncovering previously hidden data patterns and rapidly moving from data acquisition to analytic insight. Thus, the tool enables more consistent, repeatable, and efficient time series analysis.

This tool applies various statistical tests to facilitate a better understanding of the data, including tests for trend detection, change point analysis, breakpoint analysis, and time series modeling. In particular, the tool can detect nonstationarities in the historical record to help the user segment the record into datasets whose statistical properties can be considered stationary. Users can also explore their data through three different time series models: Auto-Regressive Integrated Moving Average (ARIMA), Exponential Smoothing (ETS), and Linear Models (TSLM). These models can be applied in the forecasting, error handling, interpretation, and decomposition of hydrologic and meteorologic data.

This tool was developed to detect nonstationarities in hydrometeorological time series. The stationarity assumption (i.e., that the statistical properties of a dataset are unchanging with time) is a fundamental concept underlying many different types of hydrological analysis. This tool is applied to evaluate the stationarity of hydrometeorological records analyzed in support of USACE planning and engineering decision-making.

This functionality is contained within four different sheets:

  • Explore Data — This sheet allows users to select a) their own data or b) preloaded USGS gage datasets (annual maximum and mean monthly streamflow and stage [gage height]). From there, the user can visualize that data for immediate inspection and evaluation, with further exploration on the Data Summary, Summary Statistics, and Seasonality tabs. If USGS preloaded data is selected, the user can also visually confirm the location of a gage and navigate to the USGS gage details and Water Year Summary websites.

  • Trend Analysis — The Trend Analysis Module applies three trend hypothesis tests, along with providing trend line coefficients (i.e., Traditional and Sen’s Slope).

  • Nonstationarity Detection — The Nonstationarity Detector sheet uses different statistical methods (i.e., changepoint tests, breakpoint analysis) to detect evidence of nonstationarity in the period of record.

  • Time Series Modeling — For deeper inspection, the last tab fits a time series model to the uploaded data. Users can choose between a Time Series Linear Model, Auto Regressive Integrated Moving Average Model, and Exponential Smoothing Model, extracting both model fit statistics and the model’s forecasts for the uploaded data.


Annual and monthly data updated as of December 2025

If you have any questions or comments, please let us know by contacting our team: iirsupport@usace.army.mil



1. Select Data Source

Seasonal cycle visualization is only available for monthly datasets. Daily data must first be aggregated to monthly data before this visualization can be used.

2. Select Search Method

2. Upload Data Set

Define the path to the file you want to upload. It should be a csv file with two columns, the first of which is the date vector (mm/dd/yyyy, mm-dd-yyyy, or yyyy) and the second of which is the data for analysis. The first row should be column headers.

3. Apply Preprocessing Methods


Note: Zoom into any plot by clicking and dragging your mouse.

Loading...


Missing Values
Total Data Points

Time Series Data Table

Loading...
Start Date
End Date
Total Entries
Frequency
Missing Values
Autocorrelation (Lag 1)
** Years selected from the Data Upload tab will be applied to this data summary.

Summary Statistics*

AR Lag Correlation

Summary Statistics Description

Before formally testing a dataset for nonstationarity, Exploratory Data Analysis (EDA) is a crucial initial step. This involves examining key dataset properties relevant to subsequent analysis. The first seven summary statistics presented, often referred to as the "Magnificent Seven" (Archfield et al., 2013), provide a foundation for characterizing the data's statistical behavior. In addition to these, presented summary statistics include Anderson-Darling test results and an Autocorrelation Function (ACF) plot. These outputs help assess whether the dataset meets the underlying assumptions required for statistical methods applied in later stages of the analysis. Please note: summary statistics are generated using the time window specified within the data upload tab. The values below are linear combinations of order statistics (L-moments) rather than ordinary moments. Sample L-moments are defined in Hosking (1990).

1. L-Mean — The average value of the data--a measure of location. For reference, the normal distribution has an L-Mean of 0. Calculations use the algorithm given in Hosking (1996, p.14).

2. L-CV (Coefficient of L-Variance) — The ratio of standard deviation to the mean of the data. This coefficient takes values between 0 and 1. If the mean of the data set is zero, the coefficient of variation will approach infinity and hence cannot be calculated. Please see Hosking 1990 and 1996 for more information.

3. L-Skewness — The measure of asymmetry of the probability distribution. A normal distribution has a skew of zero, while a lognormal distribution, for example, would exhibit some degree of right-skew. L-skewness ranges from 0 to 1, with values greater than 0.300 indicative of large skewness. Please see Hosking 1990 and 1996 for more information.

4. L-Kurtosis — The measure of tail density of the probability distribution (i.e., how much of the distribution is contained in the tails). For reference, a uniform distribution has an L-Kurtosis of 0, while a normal distribution has one of 1/6. Please see Hosking 1990 and 1996 for more information.

5. AR1 — The autoregressive lag-one correlation coefficient (i.e., how predictive the previous value in the time series is of the next value). Long term monthly means are used to deseasonalize the data. The code normalizes the data and then applies a first order auto-regression function using the Yule Walker Method. Values can be positive or negative.

Please refer to the autocorrelation function (ACF) plot to visually assess this relationship: the AR1 value corresponds to the correlation at lag 1 in the correlogram.

6. Amplitude — A measure of the best fitting, annual sinusoidal curve height. Amplitude values are always positive numbers (e.g., 4, 1.5, 108). First, flows are standardized, then fitted to the linear model: cos(2*pi*t)+sin(2*pi*t) The final value is calculated as sqrt(cos(2*pi*t)^2+sin(2*pi*t)^2).

7. Phase — The measure, in radians, of the angle of the best fitting, annual sinusoidal curve at time zero. Using radians, each of the values will be between −π and π. The same pre-processing steps used for calculating Amplitude are used to calculate Phase. However, the final value is calculated as arcTan(-sin(2*pi*t)/cos(2*pi*t)).

8. Normality — The Anderson-Darling (AD) test is applied to determine whether the sample of data is drawn from a normal distribution. The Anderson-Darling test relies on the test statistic A*2 for detecting departures from normality. If A*2 exceeds a given critical threshold than the hypothesis of normality is rejected with some significance value. At a 5% significance level, normality is rejected if A*2 exceeds 0.754.

AR Lag Correlation Description

ACF: The autocorrelation function (acf) is depicted graphically using a correlogram for a given lag along with 95% confidence limits. Any output which falls outside these confidence limits can be considered autocorrelated at a 5% significance level. Autocorrelation is calculated by shifting a time series by a specific interval, known as a lag, and computing its correlation with the original, unshifted series. By definition, the autocorrelation at lag zero is always 1, as this represents the correlation of the series with itself. When autocorrelation at non-zero lags is statistically insignificant, the series can be considered random or statistically independent. In hydrology, this concept is critical. For instance, positive autocorrelation in annual peak streamflow indicates system 'memory' or persistence; it means a high-flow year is more likely to be followed by another. The most immediate interval, Lag-1, is often the most critical indicator of this short-term persistence. When examining many lags (e.g., 20 or more), it becomes statistically probable that at least one will appear significant purely by random chance.

References

1. Archfield, S. A., Kennen, J. G., Carlisle, D. M., and Wolock, D. M. (2014), AN OBJECTIVE AND PARSIMONIOUS APPROACH FOR CLASSIFYING NATURAL FLOW REGIMES AT A CONTINENTAL SCALE, River Res. Applic., 30, 1166- 1183, doi: 10.1002/rra.2710

2. Hosking, J. R. M. (1990). L-moments: analysis and estimation of distributions using linear combinations of order statistics. Journal of the Royal Statistical Society, Series B, 52, 105-124

3. Hosking, J. R. M. (1996). Fortran routines for use with the method of L-moments, Version 3. Research Report RC20525, IBM Research Division, Yorktown Heights, N.Y.


Seasonal Cycle Graph

The seasonal cycle graph shows aggregated monthly data throughout all of the years. This data includes monthly minimum, maximum, and average values. These values represent variance throughout the years for a particular month.

Seasonal Cycle Data

Loading...


Analyze Trends


Use this page to detect the presence and severity of trends. It is recommended to have at least 25 years in the analysis periods. Please update years first and then perform analysis on your data.




Interpretation of Trend Significance & Strength

Kendall's Tau:

Kendall's Tau (τ) is a measure of the trend's direction and strength, ranging from -1 (perfect decreasing trend) to +1 (perfect increasing trend).


Range of p-values Descriptors (Hirsch et al. 2015) USACE Evaluation
>0.33 About as Likely as Not Insufficient Evidence
>0.1 and ≤0.33 Likely Weak Evidence
>0.05 and ≤0.1 Very Likely Moderate Evidence
≥0 and ≤0.05 Highly Likely Strong Evidence

Loading...

Trend Line Coefficients

Loading...

Trend Hypothesis Test

Loading...


Identify Seasonality


Seasonal Decomposition: This tool uses a series of statistical methods to identify and define seasonal patterns in the data. These techniques take into account underlying trends in the data, as well as noise and natural variability


Define decomposition analysis:

** Years selected from the trend analysis page will be applied to the seasonality analysis.


Seasonal Components (Download and Run Analysis)

Perform time series analysis with selected component:



Download data and graphs:

Seasonal Decomposition of Uploaded Data


Original Series

Loading...

Trend Component

Loading...

Seasonality Component

Loading...

Noise Component

Loading...



Detect Nonstationarities


The Stationarity Assumption: This tool uses regression techniques to fit trend lines to the data. The slopes produced by the regression techniques can be used as optimized trend measurements.

Nonstationarity Analysis: This tool uses statistical testing to detect the presence of nonstationarities in the uploaded data. These tests examine the data for nonstationarities (or changes) in the data mean, variance, or distribution.

Large Datasets and Computational Complexity: There are three computationally expensive tests: Energy Divisive, Smooth Lombard Wilcoxon, and Smooth Lombard Mood. When selecting the checkbox to remove computationally expensive tests, these tests will be removed from analysis and focus on the remaining nine methods.

*** If nonstationarities are not all appearing on the heatmap that appear in the time series graphic, please try expanding the graph by enlarging the application screen.

The USGS streamflow gage sites available for assessment within this application include locations where there are discontinuities in USGS peak flow data collection throughout the period of record and gages with short records. Engineering judgment should be exercised when carrying out analysis where there are significant data gaps.

In general, a minimum of 30 years of continuous streamflow measurements must be available before this application should be used to detect nonstationarities in flow records.



Gage Drainage Area (SQM)
Plot Check

Method Explorer

**All tests are abrupt except for Smooth Lombard Mood and Smooth Lombard Wilcoxon.
** All tests are nonparametric except for the Bayesian Changepoint Tes.t
**All tests are change point tests except the Breakpoint method. For more details on detected breakpoints, see the Breakpoint tab.
** Begin analysis by clicking on 'Run Nonstationarity Analysis' button.

Sensitivity Parameters

(Sensitivity parameters are described in the manual. Engineering judgment is required if non-default parameters are seletected.)


Nonstationarity Detection

Loading...

Loading...

Loading...


Test for Breakpoints


Approach: This tool uses linear regression, and the analysis of model errors with hypothesis testing, to identify points in the data that reflect sharp changes in behavior, suggesting the need for segmented analysis.


Missing Values: Breakpoint analysis models typically struggle to handle missing values. To facilitate robust analysis, this function concatenates data across missing values to smooth out gaps in the data and accurately detect breakpoints. The final visualization and analysis maintains your original selected approach to handle missing values, but the underlying methodology reduces that possibility that missing values do not misalign analytic insights.


Trend Significance: To aid in the understanding of the breakpoints, this tool has the option to show segment trend lines. For these individual segments, it also provides significance testing using the t-test. “Within the data table, the slope value will have an *, ** or *** if the p-value of these tests shows significance of trend at the 0.33 (*), 0.1(**), and 0.05 level (***)





Metrics used in determining optimal breakpoints

Residual Sum of Squares (RSS):

the residual sum of squares. The residual value is a measure of how far the regression line is from the original data. This term is used to measure the amount of variance in a data set that is not explained by a regression model itself. Both RSS and BIC are directly used in the selection of breakpoints.

Bayesian Information Criterion (BIC):

an index, based on Bayesian statistics, that is used to determine what model is best for a given dataset. In this case, the criterion helps determine an optimal number of structural breaks. The BIC adds a penalty term, which favors more parsimonious models over more complex models. This penalty term helps prevent overfitting.

Root Mean Square Error (RMSE):

the standard deviation of the residual values. The residual value is a measure of how far the regression line is from the original data.


Loading...

Breakpoint Segment Details

Loading...

Build Time Series Models


This section helps users determine the appropriate time series model by using techniques that control for seasonality, trend, and nonstationarities and visualizing outputs.

Select a time series model:




Time Series Linear Model parameters:

ARIMA Model parameters:

ETS Model parameters:



Loading...

Residuals Over Time

Loading...

Residuals Autocorrelation

Loading...

Instruction Documents

Installing DOD Certificates

Application Resources

To learn more about the tool, we strongly recommend reading the user guide.


If you have any questions or comments, please let us know by contacting our team: iirsupport@usace.army.mil