American Association of
State Highway and Transportation Officials
Special Committee on
Research and Innovation
FY2023 NCHRP PROBLEM
STATEMENT TEMPLATE
Problem Number:
2023-A-02
Problem Title
Enterprise
Data Warehouse Implementation Guide
Background Information and Need For Research
As part
of a robust data governance strategy, agencies must decide how to best manage
storage, access, and dissemination of data products and services both for
internal use/reuse and external distribution within its data architecture. An
important piece of a modern data architecture is an enterprise data warehouse,
which, conceptually, will provide a way to reduce data redundancy, improve data
consistency, and enable data usage for better decisions. Effective implementation
of a data warehouse is complex, especially when an entity has highly diverse
data sets and technology infrastructure. Thus, DOTs will benefit greatly from
guidance on how to best architect and implement an enterprise data warehouse
strategy that will meet diverse business needs while remaining performant,
however, the industry lacks a guidebook on how to develop and implement a
comprehensive data warehouse for a DOT.
The
guidance should cover the complete set of functional requirements such as, but
not limited to, federation (aggregating from multiple sources), data
extract-transform-load (ETL) and extract-load-transform (ELT), storage, naming
convention, structure (model), roles and responsibilities, web services, data
publishing processes, access by third-party applications, and records
management. Additionally, any conventions, models, and structure promoted by
the guideline will need to support and align with standard frameworks used in
transportation such as Industry Foundation Classes (IFC) to the greatest extent
practical.
Literature Search Summary
The value
of data warehouses for transportation has been in discussion since 1997 and the practice has continued to be
developed and refined for specific business needs. However, the guidance has
thus far been piecemeal, there is no whole guidebook on how to develop and
implement a comprehensive data warehouse for a DOT.
In late
1990s to early 2000s development of data warehouses in DOTs included one
focused on supporting a single business unit for a programming and scheduling
data mart that included recommended approaches
and another with a broader scope of transportation data and an
“analytical toolkit” that included a discussion of business requirements and
the resultant design approach.
During
the more recent past, literature on data warehousing in the transportation
realm has included design options for electric mobility , basic design needs
for a maritime data warehouse , a regional data warehouse to support a
real-time incident detection system, and a state-focused data warehouse
designed for analyzing traffic safety improvements/advancements using eight
integrated data sources related to health/medical, traffic incidents, driving
records, court records, and census data, etc.
Notably, none of the these reports reference detailed guidelines that
could be leveraged by a DOT to develop and implement a comprehensive data
warehouse for the entirety of an agency’s data.
One
recent research report evaluated open data portals for the “quality of data,
ease of use, and availability of metadata” across many transportation agencies
and found wide variations that highlighted the need for DOTs to improve data
management and access to a more standardized and accessible benchmark for
“ubiquitous audience”. The report
highlighted the best evaluated portal and its qualities but did not provide a
guide for replicating the portal.
Finally,
a NCHRP report on the effect of digitization on DOTs, which included a DOT
survey, published in 2020 highlighted the need for new research or synthesis on
“best practices of enterprise data warehouse management”.
Research Objective
This
research is intended to develop a guidebook on data warehouse implementation to
support efficient use, sharing, and reporting of data and address
transportation agency business needs. The deployable product from this research
is a best practice guide for departments of transportation (DOT) to use in
guiding the development and implementation of an enterprise data warehousing
strategy that encompasses data in structured (tabular), semi- or un-structured
(non-tabular), and geospatial file formats (includes geometry).
This
study should broadly consider the needs of the state DOTs and provide guidance
to assess data warehouse opportunities across the business needs within an
organization and the strategies used to develop, implement, and integrate the
use of data warehouses. It is expected that the guidance will address data in
structured (tabular), semi- or un-structured (non-tabular), and geospatial
formats.
The
product should provide guidance on:
• Strategies to assess current data
warehouse capabilities and opportunities.
• A glossary of terms that provides
definitions of terms and operating procedures keeping in mind diverse operational
contexts in different state DOTs.
• Identification of essential data
warehouse capabilities (input, access, cybersecurity/permission management,
privacy, data quality, etc.)
• The data management strategies and
technology needed to provide capabilities for the development and
implementation of data warehouses.
• Factors critical for the adoption of
data warehouses by agencies and the business functions the warehouse supports.
This is anticipated to include surveys or interviews with state DOT Chief Data
Officers or equivalent, as well as business domain experts for defining
reporting and analytical requirements.
Urgency and Potential Benefits
Data
resources have been developed by independent business areas to meet their needs
that has often resulted in data silos and challenges in data access for
analysis and/or effective and timely decision making. Business practices have
evolved to become more interdisciplinary which is increasing the need to access
data across business units. Data warehousing provides a cost-effective strategy
to align and access cross-organization data resources. Data warehouses support
efficient analysis and reporting by providing a single and stable source of
integrated data that is commonly needed by decision makers. With limited resources
for data management, data warehouses provide a cost-effective means to support
the multifaceted needs of our state DOTs.
Benefits
of data warehousing includes:
• Daily decision making is positively
impacted by improving data quality and increasing accessibility and security
through using inventory best practices.
• Improve organizational productivity
through improved interoperability of data and automation of common analysis and
reporting activities.
• Value is realized through time and
cost savings to agencies by providing guidance on enterprise data warehouse
strategies that can be quickly applied to improve the current state of data,
rather than duplicating efforts with multiple or unknown outcomes.
• Sharing data across business units
and agencies will be timelier and more efficient through adopting common
standards.
• Build and maintain a historic record
of agency data that is separate from operational data stores that are highly
dynamic.
• Improve efficiency of design and
implementation of collaborative efforts such as the AASHTOWare product
portfolio.
• Enable development of common
performance metrics
Implementation Considerations
Each
state DOT needs to manage its information systems within their state
requirements. This guidance will provide
useful information for agency policy and procedure to support their strategies
for data warehouse development and management.
To
support implementation, research activities and products will be shared
through:
• The AASHTO Committee on Data Management
& Analytics meetings, forums, webinars, and website
• The TRB Standing Committee on
Statewide/National Transportation Data and Information Systems
• The TRB Standing Committee on
Information Systems and Technology
• AASHTO GIS-T symposium, ITE conference,
and USDOT meetings and webinars
• Outreach to other AASHTO committees
that would benefit from the guide
Recommended Research Funding And Research
Period
Recommended
Funding: $350,000
Research
Period: 18 months
Problem Statement Author(s): For each author,
provide their name, affiliation, email address and phone.
Buffy
Conrad, Enterprise Data Governance Manager, Maryland State Highway
Administration, BConrad@mdot.maryland.gov, 410-545-8405
Leni
Oman, Knowledge Strategist, WSDOT, omanl@wsdot.wa.gov, 260-705-7974
Michael
Pipp, Chief Data Officer, Montana Dept. of Transportation, mipipp@Mt.gov,
406-444-6060
Jack
Dartman, Supervisor/Architect, Montana Dept. of Transportation,
jdartman@mt.gov, 406-444-7937
Chad
Baker, Geospatial Data Officer, Caltrans, chad.baker@dot.ca.gov, 916-247-1625
Potential Panel Members: For each panel member,
provide their name, affiliation, email address and phone.
Michael
Pipp, Chief Data Officer, Montana Dept. of Transportation, mipipp@Mt.gov,
406-444-6060
Jack
Dartman, Supervisor/Architect, Montana Dept. of Transportation,
jdartman@mt.gov, 406-444-7937
Chad
Baker, Geospatial Data Officer, Caltrans, chad.baker@dot.ca.gov, 916-247-1625
Denis
Whitney-Dahlke, Strategic Data Program Manager, Oregon Department of Transportation,
Denise.D.WHITNEY-DAHLKE@odot.state.or.us, 971-719-6274
Person Submitting The Problem Statement: Name,
affiliation, email address and phone.
Chad
Baker, Geospatial Data Officer, Caltrans, on behalf of the AASHTO Committee on Data
Management and Analytics, chad.baker@dot.ca.gov, 916-247-1625