Exploring national infrastructures to support impact analyses of publicly accessible research: a need for trust, transparency, and collaboration at scale

Jennifer Kemp; Christina Drummond; Charles Watkinson

doi:10.1017/dap.2024.78

Exploring national infrastructures to support impact analyses of publicly accessible research: a need for trust, transparency, and collaboration at scale

Published online by Cambridge University Press: 09 December 2024

and

Jennifer Kemp: Affiliation:
Strategies for Open Science (Stratos), Santa Cruz, CA USA
Christina Drummond: Affiliation:
Digital Libraries, University of North Texas, Denton, TX USA
Charles Watkinson*: Affiliation:
Library, University of Michigan, Ann Arbor, MI USA
*: Corresponding author: Charles Watkinson; Email: [email protected]

Article contents

Abstract
Introduction
The current landscape
Opportunities for action
Recommendations and next steps
Conclusion
Data availability statement
Author contribution
Funding statement
Competing interest
Footnotes
References

Abstract

Usage data on research outputs such as books and journals is well established in the scholarly community. Yet, as research impact is derived from a broader set of scholarly outputs, such as data, code, and multimedia, more holistic usage and impact metrics could inform national innovation and research policy. While usage data reporting standards, such as Project COUNTER, provide the basis for shared statistics reporting practice, mandated access to publicly funded research has increased the demand for impact metrics and analytics. In this context, stakeholders are exploring how to scaffold and strengthen shared infrastructure to better support the trusted, multistakeholder exchange of usage data across a variety of outputs. In April 2023, a workshop on Exploring National Infrastructure for Public Access and Impact Reporting supported by the United States (US) National Science Foundation (NSF) explored these issues. This paper contextualizes the resources shared and recommendations generated in the workshop.

Keywords

Infrastructure metrics open access research funding usage data

Type: Commentary
Information: Data & Policy , Volume 6 , 2024 , e69

DOI: https://doi.org/10.1017/dap.2024.78 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

1. Introduction

Scholarly communication infrastructures have been a topic of increasing interest in recent years. The August 2022 US Office of Science and Technology Policy (OSTP) “Nelson memo” (Nelson, Reference Nelson2022) spurred conversation on how to ensure the sustainability, interoperability, and scalability of the core systems that provide a reliable, trusted foundation for evidence-based decision-making. Throughout the US Government’s 2023 Year of Open Science, scholarly communications stakeholders discussed infrastructures to support evolving requirements. While the memo itself does not mention infrastructure, compliance with access mandates necessitates grant reporting that requires support from publishers, content platforms, and related systems. Stakeholders leverage persistent identifiers (PIDs) to enable funding agencies, researchers, and others to track and evaluate the impact of scholarly outputs across platforms and systems (National Information Standards Organization (NISO), 2016). However, locating and linking unique publications, data, people, and organizations via PIDs is just the first step toward enabling innovation through interoperability.

Scholarly publishing systems and workflows are largely built around journal articles, often overlooking other outputs like books, datasets, and media, which may require their own infrastructures or modifications to existing ones. Supporting the increasing availability and use of diverse scholarship beyond articles needs further attention (Watkinson, Reference Watkinson, Drummond, Stern and Watkinson2023).

Infrastructures supporting specific outputs have emerged. Complex book objects (Ricci, Reference Ricci, Drummond and Watkinson2023) are the focus of The Directory of Open Access Books (DOAB) and the Open Access eBook Usage Data Trust (OAeBUDT). Data deposition, which the Nelson memo introduced as a compliance requirement, relates to efforts like DataCite, Make Data Count, and Dryad. Institutional repositories are supported by Jisc’s Institutional Repository Usage Statistics aggregation service) for UK (IRUS-UK) and US (IRUS-US) institutions (Lambert, Reference Lambert, Drummond and Watkinson2023).

This paper describes the US infrastructure landscape for scholarship impact analysis in 2023, with the Nelson memo and the European Open Science Cloud’s (EOSC) Interoperability Framework providing context. Existing challenges to making usage and impact data more FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al., Reference Wilkinson, Dumontier, Aalbersberg, Appleton, Axton, Baak, Blomberg, Boiten, da Silva Santos, Bourne, Bouwman, Brookes, Clark, Crosas, Dillo, Dumon, Edmunds, Evelo, Finkers and Mons2016) and recommendations to improve the current state are presented and reflect discussions among invited experts at the Exploring National Infrastructure for Public Access and Impact Reporting workshop (Drummond, Reference Drummond2023a). This paper supplements the Proceedings of the NSF Workshop with context setting for topics covered in those facilitated discussions, including usage and impact across publication types and what lessons the US might learn from EU work in this space. Those discussions, described in detail in the proceedings, were designed to explore opportunities and elicit recommendations for action. Those recommendations form the basis for the opportunities for action and next steps described in this paper, which focuses on the potential of interoperable economies of scale through shared national data infrastructure.

2. The current landscape

Standardized content usage reporting has been a mainstay of scholarly communications for over 20 years. Project COUNTER has evolved into a community-maintained usage metrics reporting standard that governs the process of distributed data processing, i.e. “COUNTING” across publishers, libraries, and their service providers (such as content hosting platforms). Usage and impact data must be retrieved and/or aggregated from across repositories, platforms, and services. Often, this toolchain includes proprietary, selective, and/or duplicative services, many of which do not allow for automated usage harvesting.

Use cases for scholarly usage data vary across diverse stakeholders. Publishers, librarians and their vendors make up the COUNTER membership, yet technical and resource limitations combined with content distributed across multiple locations means that no stakeholder, including research funders, has a complete view of usage (Mellins-Cohen, Reference Mellins-Cohen, Drummond and Watkinson2023). Publishers have multiple touchpoints in this ecosystem and may rely on technology partners like content hosting platforms to provide COUNTER-compliant usage to their internal editorial and marketing teams in addition to authors, who have begun to request usage information along with royalty reports, and librarians, who have long used usage to inform collection development (Drummond and Hawkins, Reference Drummond and Hawkins2022).

It must also be noted that usage, e.g. content views and downloads, and impact data, which may or may not include usage, are often distinct. Altmetric, for example, does not list usage data among its sources. The assessment realm, rooted in tenure and promotion in higher education, is seemingly less straightforward than usage statistics. The terms themselves are fraught. The Declaration on Research Assessment (DORA), for example, states in its Guidance on the responsible use of quantitative indicators in research assessment that “While the term ‘metric’ suggests a quantity that is being measured directly, ‘indicator’ better reflects the fact that quantities being used in research assessment are more often indirect proxies for quality” (DORA, 2024, 3). The Society Publishers Coalition has a Vital Statistics working group to develop a standardized list of metrics that can be applied across journals. Their stated need to be able to communicate how their metrics are calculated may be a good example of the kind of use case recommended in the NSF workshop, “…to establish value and increase awareness at the cross-sections of usage and accountability, assessment, and transparency” (SoPC, 2023).

Access mandates for publicly funded scholarship coupled with the movement for responsible research metrics have led to significant interest from various stakeholders in analyzing usage across business models, disciplines, and content formats. Open Access (OA) directly involves authors and funders as primary stakeholders of usage data, changing incentive structures within this complex landscape (Ricci, Reference Ricci, Drummond and Watkinson2023). For example, in the UK, the Jisc IRUS service aggregates content across 31 item types, providing standardized analytics for institutional repositories (IRs), using COUNTER data in a scalable, extensible model (Lambert, Reference Lambert, Drummond and Watkinson2023). The European Open Science Cloud (EOSC) uses the COUNTER standard to support interoperable data reporting for over 200 repositories (Manghi, Reference Manghi, Drummond and Watkinson2023).

While the usage landscape in the US similarly supports the international nature of modern research and publishing, direct federal research and development investment in similar services has not emerged, likely due to existing, legacy infrastructures meeting current needs and the comparatively delayed introduction of a public access requirement for federally funded research outputs. Many medium and larger publishers, for example, operate in multiple countries, and even small publishers or organizations such as US university and library presses may use overseas or multinational vendors and tools. One popular example is the open source Open Journal Systems (OJS) from the Public Knowledge Project (PKP), which is housed at Simon Fraser University in Canada. Its 2023 annual report shows it is used in 44 countries for over 40,000 journals. Infrastructures such as those in the PID landscape also operate globally. Crossref, for example, was founded over 20 years ago by large “legacy” publishers, which still raises concerns of undue influence by such publishers (Okune and Chan Reference Okune and Chan2023), even as Crossref’s membership has diversified beyond its early membership to over 20,000 members in 160 countries (Crossref n.d.).

Where the US landscape differs from other countries and regions is the relative lack of national or regional public funding and coordination for existing and emergent infrastructures related to public access. While in-depth discussion of issues of infrastructure ownership, governance and sustainability is both necessary and welcome, knowledge exchange occurs organically at conferences and events hosted by associations (e.g. Society of Scholarly Publishing (SSP), Association of Learned and Professional Society Publishers (ALPSP), the Association of University Presses (AUPresses)) and cross-industry membership organizations (e.g. National Information Standards Organization (NISO), Educause, Coalition for Networked Information (CNI)), with supporting analyses and resources being generated by privately supported efforts (e.g. Invest in Open Infrastructure (IOI), Higher Education Leadership Initiative for Open Scholarship (HELIOS)) and specialty consulting firms (e.g. Invest in Open Infrastructure (IOI), Ithaka S&R, Clarke & Esposito, and Strategies for Open Science (Stratos)).

Coordinated action in the distributed environment of usage metrics involves many stakeholders who are often in collaborative competition or “co-opetition.” This necessitates trust and transparency in addition to logistical interoperability. Accurately measuring the impacts of publicly accessible publishing is itself a policy issue and is highly influenced by the data sources used (Basson et al., Reference Basson, Simard, Ouangré, Sugimoto and Larivière2022). Coordination could strengthen infrastructure networks through scale while accommodating the complexities of varied scholarship stakeholders and use cases. In this global environment, scholarship moves across borders, requiring interoperable or shared solutions; such organizations and initiatives overlap, collaborate, and rely upon the same infrastructures (Stern, Reference Stern, Drummond, Stern and Watkinson2023).

2.1. Shared terminology improves interoperability

Across the variety of use cases, there is no common terminology, principles, or rules of engagement around the access and use of data for impact assessment. Federal policy distinguishes public access from open access; both drive the aggregation of usage data. Infrastructures that support licensing and business model identification often do not commonly or consistently distinguish between the multiple variations. OA indicators are likely to substitute for the term “public access,” which is not commonly or reliably used in the scholarly communications supply chain. Yet common language and frameworks are key for interoperability (Manghi, Reference Manghi, Drummond and Watkinson2023). The NSF workshop identified the need for a crosswalk or mapping to account for differences in terminology as well as measures and schema (Drummond, Reference Drummond2023a).

2.2. Benefitting from metadata and PID infrastructures

Metadata is crucial to identify and link associated outputs across different platforms and minimize broken URLs through the use of PIDs (Cousijn et al., Reference Cousijn, Braukmann, Fenner, Ferguson, van Horik, Lammey, Meadows and Lambert2021). Consider a preprint with an underlying dataset and associated software code that is published as a journal article with peer-review reports. Or a multimodal book including linked video clips, annotations, and an associated dataset of primary sources. Without accurate, interoperable metadata, connecting these outputs is challenging and time-intensive, if not impossible.

Open, community-supported infrastructures, such as Crossref and DataCite, have evolved to meet this need. For example, data citations link datasets registered with a DOI in DataCite to related content registered with Crossref. Interoperability and coordination between such organizations is necessary. The scholarly communications supply chain must work together to ensure such information is interoperable and linkable (ORFG, 2024).

2.3. Addressing usage considerations across scholarship outputs

Journal articles and books have long been part of the online scholarly usage landscape. Other outputs often do not have dedicated, standardized or reliable usage reporting. Grey literature, that is reports and papers produced outside of traditional publishing, is diverse, diffuse, and often outside of established library- and repository-based acquisition and reporting workflows (Schöpfel, Reference Schöpfel2018). Yet grey literature is vital for public policy (Lawrence et al., Reference Lawrence, Houghton and Thomas2014), and as more of it becomes discoverable online through services like Overton and Policy Commons, the scholarly community may show increasing interest in its usage.

While the Nelson memo specifically references peer-reviewed content, linking all types of scholarly outputs to policy goals may be of growing interest for funding agencies and the larger scholarly community. Issues like climate change and the pandemic response have heightened attention to how public access impacts the speed of innovation, raising interest in how outputs like preprints, software, and protocols are accessed and used worldwide.

2.3.1. Data

Defining data across disciplines is a challenge (Sever, Reference Sever2023) and areas like the humanities often do not generate what is traditionally considered data, or at least recognize their outputs as such (Ruediger and MacDougall, Reference Ruediger and MacDougall2023). Data citation practices are still developing for researchers and publishers (Lowenberg et al., Reference Lowenberg, Lammey, Jones, Chodacki and Fenner2021) so evaluating the impact of the full breadth of outputs, publicly funded or not, remains a significant hurdle. Per Kristi Holmes, “A true understanding of the investment, reach, and impact made in publicly accessible research data is only possible with open, transparent, and responsible data metrics” (Holmes, Reference Holmes2023, 3).

2.3.2. Books

Books and book metadata have a particularly complex supply chain rooted in print sales with book usage distributed across multiple platforms (Clarke and Ricci, Reference Clarke and Ricci2021). While increased distribution benefits readers and authors, it complicates the process of identifying, aggregating, and reporting usage for a given title with multiple URLs and digital locations. Digital book chapters also exacerbate difficulties when not given distinct Digital Object Identifiers (DOIs) and metadata to facilitate linking (Lin, Reference Lin2016). This makes it challenging for research administrators, who rely on Research Information Management (RIM) systems, which systematically under-represent book literature, to identify faculty who have authored book chapters (Bryant et al., Reference Bryant, Watkinson and Welzenbach2021) and assess the impact of such work (Kemp and Taylor, Reference Kemp and Taylor2020). Bibliometricians may put in considerable manual effort to get a full picture of the institutional affiliations involved in a contributed volume. Some publishers do not register DOIs for books such as professional medical titles (Conrad and Urberg, Reference Conrad and Urberg2023). Without such core metadata, the ability to link funding to outputs is challenged (Tkaczyk, Reference Tkaczyk2023), and the entire landscape suffers as a result (Conrad and Urberg, Reference Conrad and Urberg2021).

2.4. Identifying barriers to impact assessment

While digital books have long been hosted on multiple platforms, the liberal reuse licenses of publicly accessible resources mean that other output formats will also be increasingly distributed. From an impact assessment perspective, multiple content hosting providers and platforms (URLs), and potentially different versions of works must be considered. Getting holistic usage metrics from all sources and ensuring they are comparable (if they are not COUNTER-compliant) takes time. Individuals within organizations in the usage data supply chain currently decide how to standardize and aggregate statistics for individual outputs, creating a downstream ripple effect that influences how interoperable the data is that underpins assessment of OA impacts by discipline, region, institution, output type, and format. COUNTER recently launched a community consultation on the issue of syndicated content (COUNTER, 2024).

Citations have long been the main currency of evaluating research outputs (White, Reference White2019), but they reflect only a subset of scholarly publication usage. Books are relatively sparsely included in services like Web of Science and Scopus. Non-citation-based use can include student and researcher reading and downloads, annotations and social network sharing, and context-specific professional workforce engagement by clinicians and other “on the job” practitioners. Sometimes usage data is explicitly acknowledged. Other times it is implied, though “It is well understood that metrics include more than citations and usage” (Lowenberg et al., Reference Lowenberg, Chodacki, Fenner, Kemp and Jones2019, 17).

Though the focus of research assessment is often the researcher or institution, the meta-analysis fields of bibliometrics and scientometrics are growing (Organisation for Economic Co-operation and Development [OECD], 2023). Improving the evaluation of public and open access models will require timely, granular access to interoperable, high-quality, “trusted” impact metrics.

Yet entities that create usage and impact data may face barriers when adopting standards or engaging in multinational efforts. Challenges to providing usage reporting exist for many smaller publishing organizations, including library publishers (Mellins-Cohen, Reference Mellins-Cohen, Drummond and Watkinson2023), smaller university presses (Sherer, Reference Sherer2023), and university library repositories (Lambert, Reference Lambert, Drummond and Watkinson2023).

Project COUNTER itself operates on a volunteer network and has only recently added a part-time role to work with its half time Executive Director (Mellins-Cohen, Reference Mellins-Cohen, Drummond and Watkinson2023). It is reasonable to consider the resources needed for the interconnected infrastructures that support metrics interoperability. Workshop participants discussed operational and sustainability risks associated with thin staffing levels, noting that financing infrastructure staffing and cross-infrastructure coordination should be a priority.

2.5. Understanding the role of national and multinational factors

The practicalities of access mandates, related reporting requirements, and the data exchange to support them are inherently international. However, it is unclear if global infrastructures or federated networks of domestic infrastructures are best positioned to interoperate. Multinational infrastructures, including those that support scholarly communications, must carefully navigate layers of complex regulation. Multiple NSF workshop participants noted that the challenges of data sharing are often less technical than they are administrative, legal or policy-based (Drummond, Reference Drummond2023a).

Transparency, trust, and participation incentives have been highlighted as requisite to advancing global usage data interoperability. Alignment among stakeholders was a theme of the NSF workshop, particularly for incentives (Drummond, Reference Drummond2023a). However, cultural perspectives vary on the sensitivity, security, and data management requirements. From a legal and ethical perspective, the exchange of Internet Protocol (IP) addresses tied to the “use” or readers of scholarship at a specific location or by a particular author, or scholar can raise concerns over potential negative uses of such information, from misinformation campaigns to surveillance. While many countries treat IP addresses as personally identifiable information subject to privacy protections such as the European Union’s (EU) General Data Protection Regulation (GDPR), the US does not have such a comprehensive federal law. Rather, the regulation of IP addresses as personally identifiable information varies by state in a very dynamic legislative space.

Whether usage data can legally be exchanged is a separate question from whether it can ethically be exchanged. In the workshop, the CARE Principles for Indigenous Data Governance (Collective Benefit, Authority to Control, Responsibility, Ethics) (Carroll et al., Reference Carroll, Garba, Figueroa-Rodríguez, Holbrook, Lovett, Materechera, Parsons, Raseroka, Rodriguez-Lonebear, Rowe, Sara, Walker, Anderson and Hudson2020) were raised as a way to consider data sensitivity. Is there a collective benefit to sharing data? If so, who has the authority to allow data sharing and use and the responsibility for ensuring such use is appropriate and legal? Is such sharing and use ethical in the eyes of the people behind the data—in this case the individuals and organizations behind the IP addresses attributed to usage data? At the workshop, organizer Christina Drummond suggested that scholarly communications are rapidly approaching a future where FAIR and CARE are necessary to ethically share usage data.

Exploring such matters across borders requires coordination, and the US is widely considered to be playing catchup to Europe and other leading regions and countries on open and public access, data brokerage regulation, and privacy policy. Europe supports infrastructure networks through funding mechanisms like the European Research Infrastructure Consortium (ERIC) approach, which allows infrastructure organizations to achieve economies of scale and better alignment as a single regional network (Stern, Reference Stern, Drummond, Stern and Watkinson2023). The approach of America’s agencies and innovation networks remains an open question, and the opportunity exists to build in best practices, such as those from UNESCOFootnote ¹.

3. Opportunities for action

Invited experts at the April 2023 NSF workshop Exploring National Infrastructure for Public Access and Impact Reporting identified four key areas for action to strengthen infrastructure for impact metrics for publicly accessible scholarship: 1) engaging stakeholders 2) piloting a Minimum Viable Product, 3) understanding shared values and principles, and 4) addressing usage data ownership (Drummond, Reference Drummond2023a). This section outlines the opportunities identified to better support impact and usage analytics across the research ecosystem.

3.1. Engage stakeholders to best coordinate and leverage resources

Workshop participants noted how infrastructure projects require substantial commitments of financial and staff resources. Participants emphasized that care should be taken to avoid effort duplication. The problems faced are too grand for any effort to “go it alone” and time cannot be wasted “recreating the wheel” (Drummond, Reference Drummond2023a, 18). Rather, participants strongly voiced a need for funding to support coordination and collaboration, so aligned efforts can advance together. Specific resources to engage included as follows:

• Standards for metrics of interest (e.g. COUNTER), and for persistent, unique identifiers (e.g. Open Researcher and Contributor IDs (ORCIDs), Research Organization Registry (RORs))
• Federal public policy guidance, e.g. the OSTP Nelson memo
• Open metadata, e.g. Crossref
• Staff and leadership of related infrastructure efforts, e.g. COUNTER, OA Switchboard, OA Book Usage Data Trust (OAeBUDT)
• Existing research metrics and dashboard providers, e.g. Digital Science

Greg Tananbaum spoke about how equity relates to usage data, noting that the Open Research Funders Group (ORFG) is looking to “build a system that enables evidence-based policymaking, that improves public engagement with and trust in science” (Tananbaum, Reference Tananbaum, Drummond and Watkinson2023).

3.1.1. Leverage established networks

Given the diversity of scholarly outputs across public and private stakeholders, coordinating engagement across professional networks could increase the speed of infrastructure development while fostering interoperability. As described below, different types of organizational networks and consortia within the US are well-positioned to assist in engaging their membership.

3.1.2. National research and education networks (NRENs)

NRENs illustrate how federated networks of independent infrastructures can provide internet and identity access and authentication services for higher education worldwide. In the US, Internet2 provides such services; outside the US, peer NRENs provide support for open science. Such membership organizations offer support spanning scholarly disciplines and associated communities, including the varied output formats that are increasingly open access and distributed across platforms. An opportunity exists to explore how scholarly infrastructures could leverage a US NREN and its global peer network.

3.1.3. Consortia and durable coalitions

In the US, the culture of collaboration among libraries is well established through membership consortia, such as the Big Ten Academic Alliance (BTAA). Libraries each belong, on average, to 2.5 consortia and one such consortium, Lyrasis, has over 1000 members and serves as a US hub that supports libraries connecting to global infrastructures ORCID and IRUS (Lair, Reference Lair, Drummond and Watkinson2023). This example illustrates how membership networks already support rapid, community-informed infrastructure development through scalable network-based engagement.

Funding agencies and private foundations coordinate related efforts as well. The ORFG, for example, works “to develop actionable principles and policies that promote greater dissemination and transparency and replicability and reuse of papers, data, and a whole range of other research types” (Tananbaum, Reference Tananbaum, Drummond and Watkinson2023). They support The Higher Education Leadership Initiative for Open Scholarship (HELIOS), an example of the “durable coalitions” the ORFG aims to facilitate (Tananbaum, Reference Tananbaum, Drummond and Watkinson2023).

Workshop experts agreed that engaging existing networks should be the starting point when considering coordinating federal resources.

3.2. Learn from a minimum viable product (MVP)

In the NSF workshop, experts documented the need to learn from an MVP related to impact and usage data exchange, such as the Open Access Book Usage Data Trust’s International Data Space (OAeBUDT IDS), to support multiple use cases. Specific questions they suggested an MVP could answer included:

• What users will pay for, to ensure sustainability
• Whether the focus should be on impact, use, or both
• What data is of value to exchange
• What problem would be solved through controlled data exchange

Drummond noted that the OAEBUDT’s piloting of an International Data Space (IDS) could serve as such an MVP, as their limited focus on OA book usage could provide potentially extensible infrastructure tested by scholarly publishing stakeholders who work with content beyond books.

3.2.1. Data intermediary efforts

Europe’s Data Governance Act empowers a network of neutral, certified data intermediaries known as International Data Spaces (IDS) to ensure the ethical exchange and controlled use of sensitive data across public and private parties (Nagel and Lycklama, Reference Nagel and Lycklama2021). Since 2020, a group of global stakeholders, including universities, repositories, libraries, book discovery platforms and publishers, have been exploring how to leverage the IDS model to address data sensitivity concerns pertaining to the international aggregation and curation of granular OA book usage data (Drummond, Reference Drummond, Drummond and Watkinson2023b). Using the narrow use case of digital book-related view and download metrics, the OAeBUDT is piloting the IDS Reference Architecture Model and its associated standards to securely route and distribute sensitive usage data across public and private parties at scale. It features a community governance model, maintains data sovereignty, and accommodates the range of public access stakeholders as data contributors and recipients. While currently focused on books, stakeholders include extensibility as a core principle, paving the way to extend such infrastructure should the IDS model generate a return and make sense for other publicly accessible outputs given different data supply chains and impact vocabularies.

3.2.2. Federal demonstration project networks

Federal agencies are coordinating on pilots to improve access to potentially sensitive federal data sets through efforts such as The National Secure Data Service Demonstration Project (NSDSD), which aims to facilitate use of US federal and nonfederal data to better inform decision-making. It requires collaboration across government and external stakeholders and is currently testing a pilot with the America’s Datahub Consortium (Madray, Reference Madray, Drummond and Watkinson2023). Together, their pilot project aims to facilitate the secure public/private sharing and use of federal statistics that relate to the general public.

3.2.3. Infrastructure versus projects or research centers

While research into MVP infrastructure development is underway, scholars studying data collaboratives across the globe have found that many become one-offs as they do not scale, struggle with sustainability, or raise concerns over how the data itself is exchanged (Verhulst, Reference Verhulst, Drummond and Watkinson2023). Yet, many examples of successful trusted data intermediaries, such as those documented by Data CollaborativesFootnote ², already exchange data for independent, cooperative, or directive use across public and private partners (Verhulst et al., Reference Verhulst, Young, Winowatan and Zahuranec2019).

3.3. Understanding shared values and principles

The NSF workshop attendees unanimously supported learning from existing models and emerging efforts, including the ZERO Copy Integration Framework in Canada, La Referencia, and REDALYC in Latin America, and the European IDS model (Drummond, Reference Drummond2023a). Cross-stakeholder collaboration is necessary to avoid effort duplication, diverging standards, and fractured community governance.

Such collaboration is needed to agree on definitions and shared language, establish values and principles, inform best practices, and develop standards. Parties to bring together span research, publishing, and policy, including funding agencies, repositories, libraries and their consortia, publishers and their service providers, federal agencies and policymakers, and scholars and their affinity groups. Potential venues for coordination include the US National Information Standards Organization (NISO), the Data Curation Network (DCN), the Research Data and Preservation Association (RDAP), the Research Data Alliance (RDA-US), and HELIOS, whether as models or partners. Involving representatives in planning efforts can be mutually beneficial. The experiences of efforts in the EU, with 450 million people in 27 countries using 24 languages, may be useful for US organizations navigating a patchwork quilt of state-based regulations alongside federal data policy.

European efforts that could inform and/or partner with US infrastructure development include as follows:

• The infrastructure consortia Open Scholarly Communication in the European Research Area for the Social Sciences and Humanities (OPERAS), which is in the process of becoming a European Commission-recognized ERIC that unites the organization’s members running infrastructures like IRUS (such as Jisc), DOAB, and OAPEN Library (OAPEN Foundation).
• The European Open Science Cloud (EOSC) and its network of services maintained in conjunction with OpenAIRE. Over 200 repositories contribute usage information to this infrastructure, which leverages EOSC to exchange interoperable data. This usage data can be accessed by data scientists and policymakers, as well as researchers, for an aggregate of usage of Their outputs across participating institutions (Manghi, Reference Manghi, Drummond and Watkinson2023).

Notably, governance development for the OAEBUDT data space for scholarly impact metrics exchange, made possible through a three-year grant from the Mellon Foundation, is being coordinated by representatives of OPERAS, OpenAIRE, and the University of North Texas. Given the variety of global stakeholders, NSF workshop experts encouraged documenting and sharing the values and principles guiding work in this shared space.

3.4. Research issues of usage data ‘ownership’ and authority over use

Research is needed to understand which authoritative stakeholder can authorize downstream usage data sharing and use for given scholarship outputs, disciplines, and political geographies. Culturally or geographically dependent policies and norms affect organization policies and workflows and there is no one-size fits all approach (Drummond, Reference Drummond2023a). When contractual relationships related to the scholarly publishing lifecycle span authors and universities, editors, publishers, aggregators, discovery services, digital libraries, and the many data processors and controllers behind these organizations, it is unclear whether or how “data ownership” and “data rights” follow the data or who is positioned to authorize the application of usage data for different purposes (Drummond, Reference Drummond2023a, 6). For all these reasons, “open as possible and as controlled as necessary” (Drummond, Reference Drummond2023a, 8, Reference Drummond, Drummond and Watkinsonb) may be needed in the face of artificial intelligence (AI) and big data computation of public domain or harvestable data. In short, ethical, legal, and contractual research questions abound (Rabar, Reference Rabar2023).

Workshop participants concluded that the complexity of this issue necessitates research into applicable laws and norms and further recommended the development of shared model license language.

4. Recommendations and next steps

The overarching, unanimous recommendation from workshop experts is to consider existing resources and networks first and foremost, explore, leverage, and invest in existing models, platforms, and providers (Drummond, Reference Drummond2023a). This would include library consortia as well as the use and development of standards and engaging with PID infrastructures. It could also be valuable to explore federated public funding models such as the European Research Infrastructure Consortium (ERIC), which will allow OPERAS to support open infrastructures through financial commitments from EU member states (Stern, Reference Stern, Drummond, Stern and Watkinson2023).

The development of a data intermediary or data cooperatives to formalize infrastructure partnerships could facilitate the use of data in a responsible, trusted way (Verhulst, Reference Verhulst, Drummond and Watkinson2023). Leveraging the European community-governed IDS model could provide structure to address privacy and ethical concerns of downstream use of detailed usage data, for example, in unbounded AI training.

Near-term priority projects were identified at the end of the NSF workshop. Since then, the OAEBUDT MVP data space for usage metrics has advanced, with a proof of concept IDS launch for 2024 prior to extensibility research scheduled in 2025. NSF support awarded since the workshop now supports teams crosswalking impact-related vocabulary definitions and documenting usage data supply chains for publicly accessible journals and open data, to complement such information for books.

Additional identified research opportunities that are not yet underway to the authors’ knowledge include legal research on usage data “ownership,” intellectual property and privacy dimensions of usage and impact data, further use case and values and principles explorations, and stakeholder education on the issues involved and potential solutions. Funding coordination for both existing infrastructure efforts and emerging models was highlighted as critical to near-term success and international alignment (Drummond, Reference Drummond2023a).

5. Conclusion

Momentum is growing to share and evaluate usage and impact data to support evidence-based decision-making. The scholarly community shares a desire to illuminate the impact of the breadth of funded research outputs and has collaboratively defined a path forward to further explore whether shared infrastructure will benefit the varied stakeholders involved in publicly accessible scholarship publication and discovery.

Data availability statement

None.

Acknowledgments

The authors are grateful to Charles Watkinson for co-organizing the National Science Foundation (NSF) workshop on which this paper is based, Dr. Katherine Skinner for designing and facilitating, and the Coalition for Networked Information (CNI) for hosting.

Author contribution

A Please provide an author contributions statement using the CRediT taxonomy roles as a guide: https://www.casrai.org/credit.html. Conceptualization: Christina Drummond. Funding Acquisition: Christina Drummond. Methodology: Christina Drummond; Jennifer Kemp. Writing original draft: Christina Drummond; Jennifer Kemp. Writing review and editing: Christina Drummond; Jennifer Kemp. All authors approved the final submitted draft.

Funding statement

This work was supported by the <National Science Foundation > under research grant (<2,315,721 >). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interest

Jennifer Kemp is a volunteer member of the Open Access eBook Usage Data Trust (OAeBUDT) Board of Trustees and was compensated for work on this paper from the grant that supported the NSF workshop. Christina Drummond coorganized the workshop discussed in this paper and serves as the Executive Director for the Open Access eBook Usage Data Trust (OAeBUDT) through her host institution, the University of North Texas.

Footnotes

¹ https://doi.org/10.54677/QZPQ1991

² https://datacollaboratives.org/

References

Basson, I, Simard, M-A, Ouangré, ZA, Sugimoto, CR, & Larivière, V (2022). The effect of data sources on the measurement of open access: A comparison of Dimensions and the Web of Science. PLOS ONE, 17(3), e0265545. https://doi.org/10.1371/journal.pone.0265545.CrossRef Google Scholar PubMed

Bryant, R, Watkinson, C & Welzenbach, R (2021, December 6). Guest Post: Scholarly Book Publishing Workflows and Implications for RIM Systems. The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2021/12/06/guest-post-scholarly-book-publishing-workflows-and-implications-for-rim-systems/.Google Scholar

Carroll, SR, Garba, I, Figueroa-Rodríguez, O., Holbrook, J, Lovett, R, Materechera, S, Parsons, M, Raseroka, K, Rodriguez-Lonebear, D, Rowe, R, Sara, R, Walker, JD, Anderson, J, & Hudson, M (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19. https://doi.org/10.5334/dsj-2020-043.CrossRef Google Scholar

Clarke, M, & Ricci, L (2021, April 9). OA Books Supply Chain Mapping Report. Zenodo. https://doi.org/10.5281/zenodo.4681725.CrossRef Google Scholar

Conrad, L & Urberg, M (2023, July 13). With or Without: Measuring Impacts of Books Metadata. Zenodo. https://doi.org/10.5281/zenodo.8145260.CrossRef Google Scholar

Conrad, L & Urberg, M (2021, September 30). The Experience of Good Metadata: Linking Metadata to Research Impacts. The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2021/09/30/the-experience-of-good-metadata-linking-metadata-to-research-impacts/.Google Scholar

COUNTER (2024, May 2). Policy Consultation: Syndicated Usage. https://www.countermetrics.org/consultation-syndicated-usage/ (accessed 5 August 2024).Google Scholar

Cousijn, H, Braukmann, R, Fenner, M, Ferguson, C, van Horik, R, Lammey, R, Meadows, A, & Lambert, S (2021). Connected research: The potential of the PID graph. Patterns, 2(1), 100180. https://doi.org/10.1016/j.patter.2020.100180.CrossRef Google Scholar PubMed

Crossref. (n.d.). https://www.crossref.org/ (accessed 5 August 2024).Google Scholar

Declaration on Research Assessment (DORA) (2024). Guidance on the responsible use of quantitative indicators in research assessment. DORA. https://doi.org/10.5281/zenodo.11156568.CrossRef Google Scholar

Drummond, C (2023a). Proceedings of the Workshop Exploring National Infrastructure for Public Access Usage and Impact Reporting. Zenodo. https://doi.org/10.5281/zenodo.8335916.CrossRef Google Scholar

Drummond, C (2023b, April 2). What of the OA Book Usage Data Trust approach could be extensible to other types of usage statistics? In Drummond, C. & Watkinson, C. (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_f8z6s9rt/296014552.Google Scholar

Drummond, C & Hawkins, K (2022). OA eBook Usage Data Analytics and Reporting Use-cases by Stakeholder. Zenodo. https://doi.org/10.5281/zenodo.8325578.CrossRef Google Scholar

Holmes, K (2023, April 2). Make Data Count. Zenodo. https://doi.org/10.5281/zenodo.779218.CrossRef Google Scholar

Kemp, J & Taylor, M (2020, April 29). Crossing the Rubicon - The case for making chapters visible. Crossref. https://www.crossref.org/blog/crossing-the-rubicon-the-case-for-making-chapters-visible/.Google Scholar

Lair, S (2023, April 2). How do US consortia strengthen the American scholarly communication ecosystem? In Drummond, C & Watkinson, C (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_mpij8vi1/296014552.Google Scholar

Lambert, J (2023, April 2). What of the JISC/IRUS approach could be extensible to other sources of usage statistics? In Drummond, C & Watkinson, C (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_hvo62zmw/296014552.Google Scholar

Lawrence, A, Houghton, J, & Thomas, J (2014). Where is the evidence: realising the value of grey literature for public policy and practice. Swinburne Institute for Social Research. https://doi.org/10.4225/50/5580B1E02DAF9.Google Scholar

Lin, J (2016, August 25). The article nexus: linking publications to associated research outputs. Crossref. https://www.crossref.org/blog/the-article-nexus-linking-publications-to-associated-research-outputs/.Google Scholar

Lowenberg, D, Lammey, R, Jones, MB, Chodacki, J, & Fenner, M (2021, April 19). Data Citation: Let’s Choose Adoption Over Perfection. Zenodo. https://doi.org/10.5281/zenodo.4701079.CrossRef Google Scholar

Lowenberg, D, Chodacki, J, Fenner, M, Kemp, J & Jones, MB (2019, November 1). Open Data Metrics: Lighting the Fire (Version 1). Zenodo. https://doi.org/10.5281/zenodo.3525349.CrossRef Google Scholar

Madray, H (2023, April 2). How could the NSDS inform multi-platform public-private usage and analytics exchange? In Drummond, C & Watkinson, C (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_3fpkrjvq/296014552.Google Scholar

Manghi, P (2023, April 2). How could the EOSC/Core Interoperability Framework approach inform multi-platform public-private usage and impact analytics exchange in the US? In Drummond, C & Watkinson, C (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_k8jb4tj4/296014552.Google Scholar

Mellins-Cohen, T (2023, April 2). What, if anything, is needed beyond COUNTER to improve usage and impact data interoperability? In Drummond, C & Watkinson, C (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_mnb609au/296014552.Google Scholar

Nagel, L, & Lycklama, D (2021). Design Principles for Data Spaces - Position Paper (1.0). Zenodo. https://doi.org/10.5281/zenodo.5105744.CrossRef Google Scholar

Nelson, A (2022, August 25). Ensuring Free, Immediate, and Equitable Access to Federally Funded Research. https://www.whitehouse.gov/wp-content/uploads/2022/08/08-2022-OSTP-Public-Access-Memo.pdf.Google Scholar

NISO RP-25-2016, Outputs of the NISO Alternative Assessment Metrics Project. National Information Standards Organization. https://doi.org/10.3789/niso-rp-25-2016.Google Scholar

OECD calculations based on Scopus Custom Data (March 2023). Elsevier, Version 1.2023. https://www.oecd.org/sti/inno/scientometrics.htm.Google Scholar

Open Research Funders Group (ORFG) PID Strategy Working Group (2024). Developing a US National PID Strategy. Zenodo. https://doi.org/10.5281/zenodo.10811008.CrossRef Google Scholar

Okune, A., … Chan, L. (2023). Digital Object Identifier: Privatising Knowledge Governance through Infrastructuring. In Routledge Handbook of Academic Knowledge Circulation (pp. 278–287). Routledge. https://doi.org/10.5281/zenodo.8339087CrossRef Google Scholar

Rabar, U (2023). OAEBUDT Community Consultation Results January 2023–July 2023. Zenodo. https://doi.org/10.5281/zenodo.8325572.CrossRef Google Scholar

Ricci, L (2023, April 2). How similar are usage and impact data pipelines for OA books to pipelines for other publicly accessible scholarship? In Drummond, C & Watkinson, C (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_v6nxhuxp.Google Scholar

Ruediger, D & MacDougall, R (2023, March 6). Are the Humanities Ready for Data Sharing? https://doi.org/10.18665/sr.318526.CrossRef Google Scholar

Schöpfel, J (2018). Grey Literature and Professional Knowledge Making. Research Outside The Academy, 137–153. https://doi.org/10.1007/978-3-319-94177-6_8. hal-01872353.CrossRef Google Scholar

Sever, R (2023). We need a plan D. Nature Methods 20, 473–474. https://doi.org/10.1038/s41592-023-01817-y.CrossRef Google Scholar

Sherer, J (2023, March 2023). Guest Post: Open Access for Monographs is Here. But Are we Ready for It? The Scholarly Kitchen. https://scholarlykitchen.sspnet.org/2023/03/23/guest-post-open-access-for-monographs-is-here-but-are-we-ready-for-it/.Google Scholar

Society Publishers Coalition (SoPC) (2023). Action: Working groups. https://www.socpc.org/action (accessed 5 August 2024).Google Scholar

Stern, N (2023, April 4). Personal reflections. In Drummond, C, Stern, N, Watkinson, C (Chairs), Workshop report out: national infrastructure for public access usage and impact reporting [Panel presentation] CNI Spring 2023 Project Briefings. Denver, CO, United States. https://www.cni.org/topics/ci/workshop-report-out-national-infrastructure-for-public-access-usage-and-impact-reporting.Google Scholar

Tananbaum, G (2023, April 2). How are open research funders shaping the application and demand for usage and impact analytics? In Drummond, C & Watkinson, C (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_u5zfjk5j/29601455.Google Scholar

Tkaczyk, D (2023, February 22). The more the merrier, or how more registered grants means more relationships with outputs. Crossref. https://www.crossref.org/blog/the-more-the-merrier-or-how-more-registered-grants-means-more-relationships-with-outputs/.CrossRef Google Scholar

Verhulst, S (2023, April 2). Might a “data collaborative” be needed to facilitate public-private usage and impact data brokerage at scale? In Drummond, C & Watkinson, C (Chairs), Exploring national infrastructure for public access usage and impact reporting. Denver, CO, United States. https://www.mivideo.it.umich.edu/media/t/1_7hia0733.Google Scholar

Verhulst, SG, Young, A, Winowatan, M & Zahuranec, AJ (2019, October). Leveraging Private Data for Public Good: A Descriptive Analysis and Typology of Existing Practices. GovLab. https://raw.githubusercontent.com/GovLab/data-collaboratives/master/source/static/files/existing-practices-report.pdf.Google Scholar

Watkinson, C (2023, April 4). Objectives and Perspectives. In Drummond, C, Stern, N Watkinson, C (Chairs), Workshop report out: national infrastructure for public access usage and impact reporting [Panel presentation] CNI Spring 2023 Project Briefings. Denver, CO, United States. https://www.cni.org/topics/ci/workshop-report-out-national-infrastructure-for-public-access-usage-and-impact-reporting.Google Scholar

White, K (2019, December 17). Publications Output: U.S. Trends and International Comparisons. Science and Engineering Indicators. https://ncses.nsf.gov/pubs/nsb20206/.Google Scholar

Wilkinson, MD, Dumontier, M, Aalbersberg, IjJ, Appleton, G, Axton, M, Baak, A, Blomberg, N, Boiten, J-W, da Silva Santos, LB, Bourne, PE, Bouwman, J, Brookes, AJ, Clark, T, Crosas, M, Dillo, I, Dumon, O, Edmunds, S, Evelo, CT, Finkers, R, … Mons, B (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1). https://doi.org/10.1038/sdata.2016.18.CrossRef Google Scholar PubMed

Submit a response

Comments

No Comments have been published for this article.

Article contents

Exploring national infrastructures to support impact analyses of publicly accessible research: a need for trust, transparency, and collaboration at scale

Abstract

Keywords

1. Introduction

2. The current landscape

2.1. Shared terminology improves interoperability

2.2. Benefitting from metadata and PID infrastructures

2.3. Addressing usage considerations across scholarship outputs

2.3.1. Data

2.3.2. Books

2.4. Identifying barriers to impact assessment

2.5. Understanding the role of national and multinational factors

3. Opportunities for action

3.1. Engage stakeholders to best coordinate and leverage resources

3.1.1. Leverage established networks

3.1.2. National research and education networks (NRENs)

3.1.3. Consortia and durable coalitions

3.2. Learn from a minimum viable product (MVP)

3.2.1. Data intermediary efforts

3.2.2. Federal demonstration project networks

3.2.3. Infrastructure versus projects or research centers

3.3. Understanding shared values and principles

3.4. Research issues of usage data ‘ownership’ and authority over use

4. Recommendations and next steps

5. Conclusion

Data availability statement

Acknowledgments

Author contribution

Funding statement

Competing interest

Footnotes

References

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests