Linked Data and Data Science Linked Data and Data Science
Linked Data and Data Science


The capacity to connect any data source in the world is in our hands today, it’s what’s known as the Semantic Web or the Web of Data.

The Internet had a clear orientation to be human-readable when it was invented, now we need it to be machine-readable, meaning more efficient ways to find and contextualise information.

How does this work? Simple. Assign a universal identifier to any resource on the web. A resource can be anything like a dataset or a web page. Having an universal identifier allows various resources to be connected to one another.

An URI (Unique Resource Identifier) give us the ability to name and recognise a unique resource on the web.

And the idea is that you can make relations between resources in order to combine them and enrich the information about what you are researching.

Standards like RDF (Resource Description Framework or RDF Schema or OWL) allows us to describe a resource in simple terms with a set of triplets. The parts of a triplet are (subject, predicate, object), for example we can describe ODSC as a conference with this triplet (ODSC, is a, Conference). The subject is the resource that is been described. The predicate indicates the relationship and provides ability to create directed graphs for representation that can be queried. The object can be anything from a string or another resource. This way of simple data representation is what enables linking to different entities and resources.

SPARQL (“SPARQL Protocol for RDF Query Language”) is a SQL-like query language to explore data in RDF format. You query a datasource that is represented in RDF format.

The idea behind linked data is simple. If you have a resource then you name it with URIs, you describe it with RDF, and you combine it with SPARQL. Machines will be able to see the web as a single repository because the resources or data will be connected to each other.

The “openness” of Internet has made knowledge available to the many, when in the past it was restricted to the lucky few.

Open Data

Open data is more than making data sets available on a public website, it serves several purposes. It started in the public sector but the benefits apply to any industry and sector that seeks its development and innovation.

Accountability and transparency are two of the main benefits from the use of open data.

Data Science

In a world where data science represents the future at the core of successful companies and organizations. open data plays an important role. An increasing number of organisations are using open data.

The most successful start ups and organisations use machine learning in their processes. The match of accessing the world data + machine learning should sound attractive to any organisation.

Data Science projects using open data will transfer accountability and transparency characteristics from open data to their projects. If a project uses 100% of open data sets then it can be scrutinised by the community and held accountable for its results.


Data.World looks like a social network. You signup and receive updates in your timeline which are new datasets, new projects, recently added tags, and of course you can like them and comment them. You can find interesting conversations in the forum as well.

They recently finished the preview (beta) release, launching the website officially. Their token animal is an owl, and fun fact is that OWL stands for Ontology Web Language, a standard to represent rich and complex knowledge about things.

There is no restriction on filetypes of the files you upload or download. There is a limit of 100MB limit per project/dataset up to 3 projects on the free account.

You can make your analysis public or private, and share them with friends. You are able to discover new data sets by browsing by categories, people, or organizations. You can create rich documentation, creating descriptions at file and column level, of course, and share your notebooks. You can visualize the content and query data with SQL and SPARQL.

One the most interesting are the integrations with other platforms like Tableau Public, Google Data Studio, Python, R, Java,CKAN, and Excel. Other platforms are coming soon like Power BI and Domino Data Lab. So you can use these data visualisation platforms with data sources from data.world.

Data.world embraces the core of Linked Data, Semantic Web, and open data (you should know that those terms should be exchangeable) allowing you to query any dataset using SPARQL or SQL.

A list of good things about data.world:

  • It is fast
  • Privacy is easy to adjust
  • It has tutorials for SPARQL, SQL and Markdown.
  • Video tutorials to show its features.
  • Integration with other data tools. Here is an example for Python
  • The pace of improvements of the platform is fast and constant.


The slogan of the Semantic Web “Anyone can say anything about anything” (AAA) by Sir Tim Berners-Lee is powerful, and creates a real web for everyone.

Just imagine that your dataset can be linked, not to a single but to any other document in the world. That is the big, big leap.

Data.world sounds cool doesn’t it? Give it a try! and let us know your experience!



Diego Arenas

Diego Arenas, ODSC

I've worked in BI, DWH, and Data Mining. MSc in Data Science. Experience in multiple BI and Data Science tools always thinking how to solve information needs and add value to organisations from the data available. Experience with Business Objects, Pentaho, Informatica Power Center, SSAS, SSIS, SSRS, MS SQL Server from 2000 to 2017, and other DBMS, Tableau, Hadoop, Python, R, SQL. Predicting modelling. My interest are in Information Systems, Data Modeling, Predictive and Descriptive Analysis, Machine Learning, Data Visualization, Open Data. Specialties: Data modeling, data warehousing, data mining, performance management, business intelligence.