Why we dont need a .data TLD. And some insight to MetaCert’s mission

I was asked today by Lisa Green from Common Crawl, what I thought of a blog post written by Stephen Wolfram, where he talks about the possibility of creating a .data TLD for

highlighting the exposure of data on the internet—and providing added impetus for organizations to expose data in a way that can efficiently be found and accessed.

I’ve been working on trying to solve a simple problem for the past six years; Google and other search engines don’t provide enough information about the content or its suitability, before visiting each site. To this end, I’ve been helping to create a new technology and methodology that exposes additional information that consumers will find useful. Couple this with MetaCert’s partnership with ICM Registry for the provision of labeling every .XXX domain and my early involvement in the W3C Semantic Web Education and Outreach Programme and you get an interesting mix of feelings and emotions.

I replied to Lisa’s email, but I was so compelled by the subject that I thought I’d write a post about it. This post doesn’t express the opinion of Lisa or Common Crawl in any way.

When reading Stephen’s ’s post, I couldn’t help but feel he’s trying to reinvent a more complicated concept than the Semantic Web and then making it even more complicated by adding in a new gTLD to the mix. If the Semantic Web hasn’t seen mass adoption with the backing of the W3C over the past 15+ years, what hope does anyone have in creating a new complicated standard for trying to achieve the same thing with a new TLD. If he was talking about a slick user interface for “consumers” to access such information more easily, without knowing/caring for the jargon, he’d be onto something. In fact, he’d do exactly what MetaCert is aiming to achieve. When I was Chair of the British Interactive Media Association for three years I rarely met a designer or developer who understood the purpose of the Semantic Web - let alone what RDF is, or any of that stuff about metadata. All they care about is making sure current/existing search engines expose their customers’ websites.

Why create a new gTLD for the data that lies beneath websites? What would the domain look like? data.data, awesomecontent.data OR, is the idea to sell a .data domain for each site that wants to expose the metadata that lies beneath - I certainly hope not. In the end it would cost about $1m for the new gTLD application. $180k is the base application fee but as any TLD expert will tell you, that’s likely to jump to a million bucks easily with all the legal fees etc. Stephen is much better investing that money in MetaCert :-) - which has a six year lead, its method of labeling content is a W3C Full Recommendation, some partnerships that will help enable adoption, but more importantly, focus on what consumers need and sometimes want. I say sometimes want because they don’t always know what they need or want - that’s why we need to innovate.

MetaCert’s mission is to become the IMDB.com for the Web - providing consumers with more information about the content and its suitability before visiting a website. The first step was to help instigate the creation of a new standard for labeling content - this doesn’t guarantee adoption, but it certainly helps - and that piece of the puzzle alone took four and a half years.

The team is now building data sets, but only so tools can be built to make use of them - data alone is worthless unless it can be interpreted and easily consumed. The first data set we created is for the benefit of parents - they can now better protect their families from sexually explicit content. Adults who wish to find that type of content, but avoid it at work also benefit.

We have the largest data set of sexually explicit content worldwide, with an index of over half a billion webpages. The data by itself is worthless. Does a mother/father know what a data set is? No. Most don’t know what a browser is, let alone a browser extension or a plug-in (they are different). So we are building family safety tools that are easy to use. We are also in the process of encouraging mainstream players to update their existing family safety controls with our data set - as browser extensions etc. aren’t scalable across the entire web. We will build data sets for other useful purposes as soon as we have provided a whole product for family safety.

Going back to Stephen’s post, the aspiration/goal is admirable and similar to mine - but they require very different implementations.

A little insight to what most consumers would like to know about websites before visiting them.

  • Is it child safe / safe to open at work or in front of my kids?
  • Is it secure?
  • Does it respect my personal information?
  • Does this site really belong to this company?
  • Is everything on this site free or do I need to pay for stuff?
  • What’s the track record of this company (a little more tricky)
  • Is this website accessible to me (can I increase the size of text for example)
    Etc.

I’m providing examples because it’s important to always keep the consumer in mind. How do they benefit? What difference will it make to their life? Protecting children online, making our parents feel more safe and secure, helping people with disabilities find accessible content, are all benefits.


Comments  Join the discussion


  1. flag
    4Avatars v0.3.1 v0.3.1  Alan Dix said...

    While the goals of MetaCert are laudable and your arguments against the need for a new gTLD reasonable, comparing the two is confusing. MetaCert is about meta-information about websites whereas the data being proposed for the .data TLD is alternative computational form of the content of web sites.

    While TBL’s original Semantic Web vision has not materialised in the way it was conceived, certainly data of all sorts is becoming far more important from government open data to data journalism. As you argue, adding a .data TLD and/or more standard may not be the best way to promote it and certainly it needs to be available to real users in usable ways, but it is happening.


  2. flag
    Paul Walsh  Paul Walsh said...

    @Alan you’re right - comparing the two is confusing. I actually ended up talking about MetaCert and open standards and then changed the title - it has been a while since I’ve written a post!

    I also agree with your point regarding Governments and open data - particularly the UK Government. Unfortunately, it almost ends there. I’m an advocate for the Semantic Web and open data generally. But adding a gTLD specifically for “data” doesn’t make sense to me. I think my post was utterly confusing to say the least and I hope some readers see beyond that.


Join the Discussion

We're constantly spammed by people who have as much life as the robots they use. So, we hope you don't mind if we moderate your comment if it's your first time on this blog.