URI Policy

Author

Bart Kleijngeld (Alliander)

Version

v0.9

Feedback

Issue on GitHub (Netbeheer-Nederland/doc-uri-policy)

Abstract

Managing data can be challenging, especially when it is exchanged across organizational boundaries or integrated from multiple sources. The FAIR (Findable, Accessible, Interoperable, Reusable) principles provide guidance to address these challenges. By emphasizing machine-actionability and the extended use of metadata, they help ensure that data can be properly managed, governed, made interoperable, and reused.

To implement FAIR data management, the Web provides a fitting foundation, particularly when making use of Semantic Web technologies, Linked Data principles, and other W3C standards. This requires the ability to identify (meta)data on the Web, and for that URIs (Uniform Resource Identifiers) are necessary: they are the identifiers of resources on the Web.[AWWW]

Scope

This document provides the necessary background information and guidance to help one to:

  • Design URIs for resource identification on the Web

  • Implement representation retrieval when dereferencing those URIs

Actual design and implementation details are out of scope.[1]

FAIR principles

This document helps satisfy the following FAIR principles [FAIR]:

  • F1. (Meta)data are assigned a globally unique and persistent identifier

  • A1. (Meta)data are retrievable by their identifier using a standardised communications protocol

  • A3. Metadata are accessible, even when the data are no longer available

Conformance

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in [RFC2119].

Concepts

This section will briefly explain the necessary concepts required to be able to apply the guidelines that follow to design URIs and implement representation retrieval.

Unless specified otherwise, this section assumes the use of the HTTP protocol.

URIs, resources and representations

URIs[2] (Uniform Resource Identifiers) are identifiers of resources on the Web, and resources are all things which have a URI assigned to them[AWWW].

Resources can have representations, which are streams of bytes which reflect the state of the resource at any given time.

Instead of designing URIs, one can also choose to design IRIs.

IRIs are an extension of URIs that allow the use of the full range of Unicode characters. Since each IRI has a corresponding URI form that systems can automatically derive, it is perfectly safe to use these instead.

Information resources

Some resources are purely informational, meaning their essence can be encoded into a representation. These resources are called information resources[AWWW] or — more colloquially — Web documents[COOL-URIS][3].

Representations

Dereferencing the URI of a Web document will provide a representation.

If the representation is served from a different host than the URI of the resource, the server can redirect requests to the URI where the representation is actually hosted (using a 307 Temporary Redirect for example).

Do not use 303 See Other redirects for this purpose. These are intended to redirect URIs identifying real-world resources to Web documents that provide their descriptions.

Real-world things

Resources which are not information resources are often referred to as real-world things. These include physical, tangible objects, but also abstract concepts. These resources cannot have representations on the Web, since their essence cannot be fully captured by information.[4] However, descriptions of them can be provided by Web documents, which can refer to them by their URI.

There should be no confusion between identifiers for Web documents and identifiers for other resources. URIs are meant to identify only one of them, so one URI can’t stand for both a Web document and a real-world object.
[COOL-URIS]
3. URIs for Real-World Objects

For more detail about this refer to that section.

Descriptions

Obtaining descriptions of real-world things is more involved than for Web documents.

Since real-world things have no representations of their own, dereferencing their URIs should provide access to Web documents that describe them. There are two common approaches to achieve this[COOL-URIS]:

Hash URIs

These URIs use a fragment identifier — the part after # — to identify a real-world thing, while the base URI — the part before # — identifies the Web document that describes the resource. When the hash URI is dereferenced, only the base URI is sent to the server, so the server returns a representation of that document.

Example

http://example.org/animals#Cat is a hash URI that identifies the real-world concept Cat, as indicated by the fragment identifier Cat.

The base URI http://example.org/animals identifies the Web document that provides a description of the concept.

Hash URIs are easy to use and set up and often used for smaller ontologies.
303 (or slash) URIs

These URIs use 303 See Other redirects[HTTPRANGE-14]. When the URI of a resource that is a real-world thing is dereferenced, the server responds with a 303 redirect to a Web document that provides a description of the resource.[5]

Example

http://example.org/animals/Cat is a 303 URI which identifies the real-world concept Cat.

Dereferencing http://example.org/animals/Cat might result in a 303 See Other redirect to the Web document http://docs.example.org/animals/Cat.html which contains an HTML description of the concept.

Both approaches are valid and widely used, and come with their own pros and cons.

Content negotiation

It may be the case that a resource has multiple representations, in which case content negotiation may be used to let agents choose what representation they want.

Content negotiation is an HTTP mechanism that allows a single URI to serve different representations of the same resource based on parameters such as media type, language, character set, or content encoding.

Agents can encode their preferences in request headers, and if the HTTP server is set up properly it can serve the most appropriate representation.

Example 1. Content negotiation to obtain an HTML or a JSON-LD representation

Suppose the ontology identified by http://example.org/animals has HTML and JSON-LD representations available.

Retrieving the HTML representation
GET request
GET /animals HTTP/1.1
Host: example.org
Accept: text/html
Response headers
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 2480
Vary: Accept
Date: Tue, 26 Aug 2025 10:15:00 GMT
Server: Apache/2.4.58 (Unix)
Retrieving the JSON-LD representation
GET request
GET /animals HTTP/1.1
Host: example.org
Accept: application/ld+json
Response headers
HTTP/1.1 200 OK
Content-Type: application/ld+json; charset=UTF-8
Content-Length: 2480
Vary: Accept
Date: Tue, 26 Aug 2025 10:15:00 GMT
Server: Apache/2.4.58 (Unix)

This example demonstrated negotiation based on media type using the Accept header.

  • Chapter 4 of [COOL-URIS] about hash URIs and 303 URIs, their trade-offs and how to decide which one to use. Content negotiation is covered as well.

  • [HAWKE] about ambiguity challenges caused by hash URIs.

Guidelines

The guidelines described in the table below should constrain and guide one to design stable URIs and provide predictable and appropriate representations.

These guidelines are deliberately presented in compact table form for ease of reference. For more information please refer to the references to other sources which can be found in the table and throughout the document.

Table reading guide

The table consists of the following columns:

ID

The guideline number.

Conformity

Any of:

  • Constraint: a constraint which MUST be satisfied

  • Principle: a principle which SHOULD be obeyed

  • Practice: a good practice which MAY be applied

Definition

The definition of the rule.


Each guideline can be identified by their URI: https://nbnl.info/uri-policy#guideline-{n} where {n} is the ID described in the table.
Table 1. Guidelines[6]
ID Conformity Description

1

Constraint

Anything that needs to be identified on the Web MUST have a URI.

2

Constraint

Distinct resources MUST be assigned distinct URIs.

3

Principle

Resources SHOULD be assigned only a single URI by the resource owner.[7]

4

Principle

URIs SHOULD be stable, i.e. their meaning SHOULD NOT change and representations of the resource it identifies — if available — SHOULD remain available.

5

Principle

URIs SHOULD be as opaque as possible to improve stability and discourage agents from inferring information from an identifier.

6

Principle

URIs SHOULD favor the HTTPS scheme over the HTTP scheme because of security considerations.[HALPIN]

7

Practice

URIs MAY encode ownership information in domain names, but this is discouraged because it makes the URI less opaque.

8

Practice

Hierarchical URIs MAY be used within a clearly defined application domain.

9

Principle

To identify resources that are real-world things, either a hash URI or a 303 URI SHOULD be used.

10

Principle

The owner of an information resource SHOULD provide representations for it.

11

Principle

Resource representations SHOULD be consistent and predictable to prevent changes which break things.

12

Practice

Redirects — e.g. 307 Temporary Redirect [8] — MAY be used upon dereferencing URIs to serve representations from different locations.

13

Principle

If a resource has multiple representations, it SHOULD be possible for clients to use content negotiation to select the most appropriate one.

14

Principle

If a 303 URI identifying a resource that is real-world thing is

Do not assign stable URIs to every possible entity. Focus on the resources that are part of your public interface and need to be persistently identified. Internal or transient entities should remain within the application domain and be exposed, if at all, through hypermedia links rather than permanent URIs.

Appendix A: URI reference

With the preceding background and guidance in mind, Team Semantiek has designed URI templates for all the types of definitions it is responsible for.

Refer to the Identification section on each of these pages for the URI templates to use to identify the corresponding resources.

Terms and definitions

303 URI

These URIs use 303 See Other redirects[HTTPRANGE-14]. When the URI of a resource that is a real-world thing is dereferenced, the server responds with a 303 redirect to a Web document that provides a description of the resource.

application domain

The specific context or area of concern in which a system or model operates, closely related to the notion of bounded context.

content negotiation

HTTP mechanism for serving different representations of the same resource based on client preferences (e.g. media type and language).

dereference

The process of using a URI in a protocol (e.g. HTTP) to retrieve a representation of the identified resource.

FAIR

Principles for making (meta)data Findable, Accessible, Interoperable, and Reusable.

hash URI

These URIs use a fragment identifier — the part after # — to identify a real-world thing, while the base URI — the part before # — identifies the Web document that describes the resource. When the hash URI is dereferenced, only the base URI is sent to the server, so the server returns a representation of that document.

hierarchical URI

URIs which deploy hierarchical organisation in their path.[DODDS-DAVIS] See also: Hierarchical URIs.

IRI

Internationalized Resource Identifier. A superset of URIs that allows the full range of Unicode characters.

information resource

A resource whose essential characteristics can be conveyed entirely in a representation. See also: [AWWW].

JSON-LD

A JSON-based format for serialising Linked Data.

Linked Data

The term Linked Data refers to a set of best practices for publishing structured data on the Web. See also: [LD].

media type

A media type (also known as a MIME type) is a two-part identifier used to specify the format of a file or data on the Web, helping systems understand how to process the content.

ontology

A formal and explicit specification of a shared conceptualization, which includes the categories, properties, and relations between concepts within a specific domain.

real-world thing

Physical entities or abstract concepts that exist outside the Web, but can still be identified on the Web when assigned a URI.

representation

A concrete, serialised form (e.g. HTML, JSON, RDF) of the state of a resource that can be transmitted over the Web.

resource

The term resource is used in a general sense for whatever might be identified by a URI.[AWWW]

Semantic Web

An extension of the World Wide Web which focuses on enable data to be shared and linked semantically.

URI

Uniform Resource Identifier. A string used to uniquely identify a resource on the Web. See also: [RFC3986].

Unicode

A universal standard for encoding characters from virtually all writing systems.

version control system

A system (e.g. Git) that records changes to files to manage revisions and collaboration.

Web document

An information resource retrievable on the Web that can be served in one or more representations.

References


1. This is scoped as such on purpose, because different (kinds of) resources can benefit from vastly different URI designs, and trying to design a general base URI template upfront will end up being either too rigid to fit all cases, or impenetrably abstract and therefore hard to use and potentially useless.
2. For a concise but accurate explanation of the difference between URIs, URNs and URLs, refer to [RFC3986], Section 1.1.3. URI, URL and URN.
3. Technically, Web documents are information resources which have a representation. The nuance here is that information resources can be represented on the Web, and do not need to have representations at any given time. This is only an academic concern, however, and in practice information resources and Web documents are used interchangeably throughout this document
4. Philosophers of language have studied this phenomenon in depth, but further detail is out of scope for this document. Further reading of interest may include Ludwig Wittgenstein’s idea of family resemblances.
5. A useful consequence of 303 See Other redirects is that clients replace the original URI with the redirected one, making actions such as bookmarking or linking more straightforward.
6. These guidelines are heavily inspired by — and in quite a few cases taken almost verbatim from — [AWWW], including the conformity classification.
7. It is perfectly acceptable to have additional transient or vendor-specific URLs which identify Web documents in some system — e.g. files served from a version control system — which the authoritative URI redirects to for the purpose of representation retrieval. Moreover, these are two different domains, and it makes sense to say that technically speaking the URIs refer to different resources, e.g. "my animals ontology" and "Git revision ecd7636 of file animals.ontology.jsonld."
8. When using 307 Temporary Redirect redirects, the original URI of the request is preserved by the client. This is useful since it stimulates use of the stable, canonical URI and not the transient URI that is being redirected to.