AI and Open Government Data Assets Request for Information
Primary source
Metadata and text below are from the Federal Register, a public-domain U.S. government work. Always verify the official published version before relying on it for any legal matter.
Issuing agencies
Abstract
The U.S. Department of Commerce is committed to advancing transparency, innovation, and the responsible use and dissemination of public data assets, including for use by data-driven AI technologies. To this end, we are pleased to issue this Request for Information (RFI) to seek valuable insights from industry experts, researchers, civil society organizations, and other members of the public on the development of AI-ready open data assets and data dissemination standards.
Full Text
<html>
<head>
<title>Federal Register, Volume 89 Issue 75 (Wednesday, April 17, 2024)</title>
</head>
<body><pre>
[Federal Register Volume 89, Number 75 (Wednesday, April 17, 2024)]
[Notices]
[Pages 27411-27413]
From the Federal Register Online via the Government Publishing Office [<a href="http://www.gpo.gov">www.gpo.gov</a>]
[FR Doc No: 2024-08168]
=======================================================================
-----------------------------------------------------------------------
DEPARTMENT OF COMMERCE
[Docket No. 240410-0103]
RIN 0690-XD001
AI and Open Government Data Assets Request for Information
ACTION: Notice, request for information.
-----------------------------------------------------------------------
SUMMARY: The U.S. Department of Commerce is committed to advancing
transparency, innovation, and the responsible use and dissemination of
public data assets, including for use by data-driven AI technologies.
To this end, we are pleased to issue this Request for Information (RFI)
to seek valuable insights from industry experts, researchers, civil
society organizations, and other members of the public on the
development of AI-ready open data assets and data dissemination
standards.
DATES: Comments must be received on or before July 16, 2024.
ADDRESSES: All electronic public comments on this action, identified by
<a href="http://Regulations.gov">Regulations.gov</a> docket number DOC-2024-0007, may be submitted through
the Federal e-Rulemaking Portal at <a href="http://www.regulations.gov">www.regulations.gov</a>. The docket
established for this request for comment can be found at
<a href="http://www.regulations.gov">www.regulations.gov</a>, DOC-2024-0007. Click the ``Comment Now!'' icon,
complete the required fields, and enter or attach your comments.
FOR FURTHER INFORMATION CONTACT: Please direct questions regarding this
Notice to Victoria Houed at <a href="/cdn-cgi/l/email-protection#cc8fa3a2b8adafb883999f898d8ca8a3afe2aba3ba"><span class="__cf_email__" data-cfemail="367559584257554279636573777652595518515940">[email protected]</span></a> with ``AI-Ready Open
Data Assets RFI'' in the subject line, or if by mail, addressed to
Victoria Houed, OUSEA, U.S. Department of Commerce, 1401 Constitution
Avenue NW, Room 4848, Washington, DC 20230; telephone: (202) 913-1504.
SUPPLEMENTARY INFORMATION: The U.S. Department of Commerce (Commerce)
is committed to leading the way in producing and disseminating high-
quality public data. Commerce's data assets enable U.S. scientific
discovery, innovation, and economic growth, serving as an invaluable
asset to the country. In its mission to publish data for the American
public and achieve its strategic goal to ``expand opportunity and
discovery through data,'' Commerce is dedicated to continuously
refining its processes for creating, curating, and distributing its
data as new technologies emerge. This Request for Information (RFI)
seeks to understand ways to improve Commerce's creation, curation, and
distribution of its open data assets to facilitate the development and
advancement of AI technologies such as generative AI.
Commerce, as a premier data provider, has a long history of
adapting to technological change. In the past 40 years, Commerce has
moved data publication efforts into electronic forms, and in the past
20 years, that has included the provision of both data services and
tools to support discovery and exploration of Commerce's data. In the
last five years, Title II of the Foundations for Evidence-Based
Policymaking Act, commonly known as the OPEN Government Data Act, began
Commerce's commitment to the dissemination of open data assets in
[[Page 27412]]
machine-readable formats, or ``data in a format that can be easily
processed by a computer without human intervention while ensuring no
semantic meaning is lost'' (44 U.S.C. 3502(18)).
Today, Commerce is facing a new technological change with the
emergence of AI technologies that provide improved information and data
access to users. Commerce is specifically interested in generative AI
(GenAI) applications, which digest disparate sources of text, images,
audio, video, and other types of information to produce new content.
GenAI and other AI technologies present both opportunities and
challenges for both data providers such as Commerce and data users
including other government entities, industry, academia, and the
American people.
AI has brought transformative changes to many industries including
health, finance, education, and transportation, while GenAI has the
promise of democratizing access to data by enabling the average person
to engage with data in ways that had not previously been possible.
Recent GenAI tools allow users to input simple prompts to engage with
content gathered by these tools from a wide range of sources, including
Commerce's public data.
The challenge for Commerce, as an authoritative provider of data,
is to ensure that these new AI intermediaries can appropriately access
its data without losing the integrity, including quality, of said data.
AI tools require mass amounts of trustworthy information to accurately
respond to the needs of their users. As AI applications become more
sophisticated and ingrained in everyday life, the role of high-quality
data becomes increasingly critical. Commerce acknowledges, as a key
data producer, that in order for AI systems to utilize its data for
training and for instant data retrieval, its data may need to be
reconfigured in easily consumable formats. AI tools are increasingly
used for data analysis and data access, so Commerce hopes to ensure
that the data these tools consume is easily accessible and ``machine
understandable,'' versus just ``machine readable.'' Therefore, this RFI
explores how to achieve better data integrity, accessibility, and
quality for emerging AI technologies.
The uniqueness of emerging technologies such as GenAI arises from
the fact that the interpretation and use of data is no longer solely
executed by human experts (e.g., scientists, engineers, software
developers) who bring their own knowledge and understanding to working
with Commerce's data. This human understanding is grounded in shared
disciplinary knowledge and in human-readable documentation that
Commerce provides with its published data. AI systems currently lack
common knowledge and the ability to use such knowledge in their
activity. Although these systems demonstrate fluency and intelligence,
their outputs are often driven by contextual prediction rather than
higher-order reasoning capabilities. Recent AI systems are trained on
tremendous amounts of digital content and generate responses based on
the contextual properties of that content. However, these systems do
not truly ``understand'' the texts in a meaningful way. While there is
ongoing improvement, today's AI systems are fundamentally limited by
their reliance on extensive, unstructured data stores, which depend on
the underlying data rather than an ability to reason and make judgments
based on comprehension. Knowing this, Commerce seeks to adhere to its
strategic mission to ``expand opportunity and discovery through data,''
by disseminating public data in AI ready formats while ensuring no
semantic meaning is lost.
To respond to the challenge and realize the opportunity offered by
these new technologies, it is important that Commerce enables AI
systems to access and use its public data assets correctly and
responsibly.
This RFI seeks feedback, recommendations, and suggestions from
industry experts, researchers, civil society organizations, and the
public regarding Commerce's creation, curation, and distribution of
data assets that are specifically designed to facilitate the
development and advancement of AI technologies such as GenAI.
Thus far, Commerce has made efforts to expose its public data
through structured APIs and is developing enriched metadata standards
for describing its data assets. To date, Commerce metadata has focused
on enabling discovery of data assets rather than the use of those data
assets by AI systems, but Commerce sees value in changing this focus.
Commerce seeks to further understand how it can make its data assets
AI-ready.
In particular, Commerce wishes to explore the following:
<bullet> The use of knowledge graphs for variable level metadata,
allowing systems to better link human terms to data elements;
<bullet> Embracing standardized ontologies such as <a href="http://schema.org">schema.org</a> or
NIEM;
<bullet> Harmonizing and linking our internal ontologies and
vocabularies using knowledge graphs grounded in standardized
ontologies;
<bullet> Gathering internal and external written documentation of
existing data products and:
[cir] Mining them for terminology to use in metadata harmonization
and linking; or
[cir] Releasing them in raw formats for the training of AI models;
<bullet> Adopting data formats which allow for rich metadata as
well as generating metadata ``sidecars'' for more traditional formats
such as CSV or SAS;
<bullet> Using open standards for APIs with the ability to link
into knowledge graphs; and
<bullet> Improving guidance and metadata around appropriate data
usage and licensing for purposes such as research analytics, text-and-
data mining, and AI system ingestion.
Commerce seeks comment on the topics discussed above and responses
to the following questions:
Data Dissemination Standards
1. What data dissemination standards should Commerce adopt to
support human-readable and machine-understandable public data?
2. What formats, metadata, and documentation should be prioritized
to facilitate AI applications?
3. How does raw data, such as data from the sensor networks, differ
from derived data, such as statistical data from the U.S. Census
Bureau, when it comes to metadata standards?
4. What data licensing practices, standards, and usage
considerations should Commerce consider to support broad, equitable,
and open access to its datasets and metadata?
5. What current standards exist or are under development that
Commerce should consider to clearly signal that its public data is
available for use by AI systems (or signal any accompanying conditions
or restrictions on said data)?
Data Accessibility and Retrieval
1. How can Commerce's data assets be made more accessible and
valuable to the AI community (e.g., improved API access, web
crawlability, etc.)?
2. How can Commerce develop intuitive and accessible data portals
that facilitate easy navigation and retrieval of data sets?
3. What users should Commerce consider when disseminating our AI-
ready data? What atypical users should Commerce be sure to consider?
4. What measures can be taken to encourage user-friendly
interfaces, including clear labeling and readable
[[Page 27413]]
formats, for Commerce's online data resources?
5. How can Commerce better understand the needs of users for its
data and the return on its investment in making its data more AI-ready?
Partnership Engagement
1. How can industry and academic stakeholders collaborate with the
government to shape the design and dissemination of AI-ready open data?
2. What are the potential areas of partnership, and how can
industry and academia contribute to enhancing data quality, integrity,
and usefulness for AI purposes?
Data Integrity and Quality
1. What are best practices that industries have employed to enhance
the integrity and accuracy of public data when used in AI applications?
What are best practices for data verification and validation? What are
best practices for conducting regular audits and quality checks of data
used in AI applications?
2. How can we collectively address challenges related to
authenticity bias, privacy, data quality, equity, and ethical use while
maintaining transparency and accountability?
3. What security protocols can be developed to mitigate risks of
unauthorized data access and manipulation?
4. How can Commerce promote transparency in data sourcing and
processing methods to enhance trust and reliability? What is the
expectation for reporting the quality of its data and how can we ensure
that information will be carried through and presented to the end user?
5. What validation processes can be established to maintain and
verify data accuracy and consistency?
6. How can Commerce facilitate comprehensive and transparent data
documentation for replication and analysis?
Data Ethics
1. What steps are needed to establish clear legal and ethical
guidelines for AI data usage, ensuring privacy rights, preserving
property rights, and focusing on equitable outcomes?
2. What types of policies could Commerce implement to identify and
mitigate biases in AI algorithms, including ensuring diverse data
representation?
3. What are the best protocols for ethical data collection,
processing, and storage that prioritize data integrity and accuracy?
Commerce invites your comments and insights on the above questions,
as well as any additional input you deem relevant.
Oliver Wise,
Chief Data Officer, Department of Commerce.
[FR Doc. 2024-08168 Filed 4-16-24; 8:45 am]
BILLING CODE P
</pre><script data-cfasync="false" src="/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js"></script></body>
</html>This is legal information, not legal advice. Laws vary by jurisdiction and change frequently. Always verify current law with official sources and consult a licensed attorney in your jurisdiction for advice on your specific situation.