Microdata 2 RDF Extractor

getSchema’s Microdata extractor is a REST web service to extract RDF [1] data from Microdata [2] annotations and provide the semantic information as N-Triples [5] , N3 [3] and JSON [4]. This service conforms with the Microdata2RDF specification [11] at W3C, but the generation algorithm may be different from the one proposed by the specification.
This service is powered by node.js [9] and is using the jsdom [10] library.

Use our test form to try the service: http://getschema.org/microdataextractor-test

Get some test examples from: http://getschema.org/microdata2rdf/examples/

API Endpoint

The service endpoint is

API Parameters

The following parameters are required:

url
The URL of the page containing Microdata annotations. If its value is not a valid URL or the URL does not
locate a resource then the service will return an error.
out
A parameter specifying the type of desired output. The allowed values are rdf, n3 and json . Any other value is treated as invalid and the service will return an error.

When missing any of the parameters the service will return an error.

The service allows only GET requests. Any other request type will return an error.

Test the service using this url: http://getschema.org/microdataextractor?url=http%3A%2F%2Fgetschema.org%2Fmicrodata2rdf%2Fexamples%2Fexample.html&out;=rdf

Table Of Contents

  1. Requests
  2. Responses
  3. Errors
  4. Limitations
  5. Known Usages
  6. Terms of Service
  7. Contact
  8. References

1. Requests

Requests sent to the API endpoint must be HTTP GET requests, with all arguments sent as query parameters.
All arguments must be url-encoded (as per RFC 3986, [7])

Parameters

url
The URL of the page containing Microdata annotations. If its value is not a valid URL or the URL does not
locate a resource then the service will return an error.
out
A parameter specifying the type of desired output. The allowed values are rdf, n3 and json . Any other value is treated as invalid and the service will return an error.

2. Responses

Consider the following HTML example and find below various possible service responses.

HTML Example

(See http://getschema.org/microdata2rdf/examples/example.html)

The out parameter will be changed according to the desired output format.

2.1 N-Triples Response

An N-Triples [5] response is sent when the out parameter set to rdf (out=rdf). The headers use the Content-type text/plain.

RDF Example

2.2 N3 Response

A N3 response is sent when the out parameter set to n3 (out=n3). The headers use the Content-type text/n3 .

N3 Example

2.3 JSON Response

A JSON response is sent when setting the out parameter to json (out=json). The response format follows Talis RDF-JSON [6]. It is a well formed JSON delivered using the Content-type application/json.

JSON Example

3. Errors

All service errors are delivered using JSON format. The following kinds of errors may occur:

Service Unavailable
When the service may be unfortunately down:
503 – Service Temporarily Unavailable
Missing parameters or validation errors
When one of the required parameters is not defined (missing url or out) or does not comply to the expected format.
409 – Conflict
Url not reachable
When the URL does not identify a physical resource.
502 – Bad Gateway

4. Limitations

Iframes are not loaded.

Scripts are loaded when the script element is annotated with an itemtype attribute with the value http://schema.org/WebPageElement/Script .

There might be other limitations regarding the triple extraction such as duplicates since we are still in beta.

While itemid is supported, itemref is not.

The property schema:additionalType is not processed and multiple item types for the same itemscope are not yet supported too.

5. Known Usages

RuleTheWeb! – A Firefox Extension consuming Schema.org Annotations

6. Terms of Service

This service is offered free of charge by http://binarypark.org

You must follow any policies made available to you within the Services.

We believe you will not misuse this service, rather may find it helpful. However, just in case:

Using this service does not give you ownership of any intellectual property rights related to the service or the content
you access. You may not use content from our Services unless you obtain permission from its owner or are otherwise permitted
by law. These terms do not grant you the right to use any branding or logos used in this service. Don’t remove, obscure, or
alter any legal notices displayed in or along with the service.

This service provides content that is not owned by the service provider. This content is the sole responsibility of the entity that makes it available.

The terms of use can change at any time and is not the provider responsibility to inform you.

More necessary information may be found at http://binarypark.org.

7. Contact

Would you be interested to learn more or to contribute to this service, please contact us at mtg(at)binarypark.org.

8. References

[1] Resource Description Framework (RDF), http://www.w3.org/RDF/

[2] HTML5 Microdata, http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html

[3] Notation3 (N3): A readable RDF syntax, http://www.w3.org/TeamSubmission/n3/

[4] JavaScript Object Notation, http://json.org/

[5] RDF N-Triples Syntax, http://www.w3.org/TR/rdf-testcases/#ntriples but also http://www.w3.org/2011/rdf-wg/wiki/N-Triples-Format

[6] RDF-JSON Specification, http://docs.api.talis.com/platform-api/output-types/rdf-json

[7] Uniform Resource Identifier (URI): Generic Syntax (RFC3986), http://www.ietf.org/rfc/rfc3986.txt

[8] Web Application Description Language (WADL), http://www.w3.org/Submission/wadl/

[9] node.js http://nodejs.org

[10] jsdom – A JavaScript implementation of the DOM, for use with node.js, https://github.com/tmpvar/jsdom

[11] Microdata to RDF: Transformation from HTML+Microdata to RDF, W3C Interest Group Note 08 March 2012, http://www.w3.org/TR/microdata-rdf/