Why we dont need a .data TLD. And some insight to MetaCert’s mission

I was asked today by Lisa Green from Common Crawl, what I thought of a blog post written by Stephen Wolfram, where he talks about the possibility of creating a .data TLD for

highlighting the exposure of data on the internet—and providing added impetus for organizations to expose data in a way that can efficiently be found and accessed.

I’ve been working on trying to solve a simple problem for the past six years; Google and other search engines don’t provide enough information about the content or its suitability, before visiting each site. To this end, I’ve been helping to create a new technology and methodology that exposes additional information that consumers will find useful. Couple this with MetaCert’s partnership with ICM Registry for the provision of labeling every .XXX domain and my early involvement in the W3C Semantic Web Education and Outreach Programme and you get an interesting mix of feelings and emotions.

I replied to Lisa’s email, but I was so compelled by the subject that I thought I’d write a post about it. This post doesn’t express the opinion of Lisa or Common Crawl in any way.

When reading Stephen’s ‘s post, I couldn’t help but feel he’s trying to reinvent a more complicated concept than the Semantic Web and then making it even more complicated by adding in a new gTLD to the mix. If the Semantic Web hasn’t seen mass adoption with the backing of the W3C over the past 15+ years, what hope does anyone have in creating a new complicated standard for trying to achieve the same thing with a new TLD. If he was talking about a slick user interface for “consumers” to access such information more easily, without knowing/caring for the jargon, he’d be onto something. In fact, he’d do exactly what MetaCert is aiming to achieve. When I was Chair of the British Interactive Media Association for three years I rarely met a designer or developer who understood the purpose of the Semantic Web – let alone what RDF is, or any of that stuff about metadata. All they care about is making sure current/existing search engines expose their customers’ websites.

Why create a new gTLD for the data that lies beneath websites? What would the domain look like? data.data, awesomecontent.data OR, is the idea to sell a .data domain for each site that wants to expose the metadata that lies beneath – I certainly hope not. In the end it would cost about $1m for the new gTLD application. $180k is the base application fee but as any TLD expert will tell you, that’s likely to jump to a million bucks easily with all the legal fees etc. Stephen is much better investing that money in MetaCert 🙂 – which has a six year lead, its method of labeling content is a W3C Full Recommendation, some partnerships that will help enable adoption, but more importantly, focus on what consumers need and sometimes want. I say sometimes want because they don’t always know what they need or want – that’s why we need to innovate.

MetaCert’s mission is to become the IMDB.com for the Web – providing consumers with more information about the content and its suitability before visiting a website. The first step was to help instigate the creation of a new standard for labeling content – this doesn’t guarantee adoption, but it certainly helps – and that piece of the puzzle alone took four and a half years.

The team is now building data sets, but only so tools can be built to make use of them – data alone is worthless unless it can be interpreted and easily consumed. The first data set we created is for the benefit of parents – they can now better protect their families from sexually explicit content. Adults who wish to find that type of content, but avoid it at work also benefit.

We have the largest data set of sexually explicit content worldwide, with an index of over half a billion webpages. The data by itself is worthless. Does a mother/father know what a data set is? No. Most don’t know what a browser is, let alone a browser extension or a plug-in (they are different). So we are building family safety tools that are easy to use. We are also in the process of encouraging mainstream players to update their existing family safety controls with our data set – as browser extensions etc. aren’t scalable across the entire web. We will build data sets for other useful purposes as soon as we have provided a whole product for family safety.

Going back to Stephen’s post, the aspiration/goal is admirable and similar to mine – but they require very different implementations.

A little insight to what most consumers would like to know about websites before visiting them.

  • Is it child safe / safe to open at work or in front of my kids?
  • Is it secure?
  • Does it respect my personal information?
  • Does this site really belong to this company?
  • Is everything on this site free or do I need to pay for stuff?
  • What’s the track record of this company (a little more tricky)
  • Is this website accessible to me (can I increase the size of text for example)

I’m providing examples because it’s important to always keep the consumer in mind. How do they benefit? What difference will it make to their life? Protecting children online, making our parents feel more safe and secure, helping people with disabilities find accessible content, are all benefits.

We always make certain that the essay you get from us fetches you top grades, and it always works this way because our writers take their time and ensure that your essay includes all those points that would convince your teacher to give you an A+.

There are currently 2 comments on this post 
 Leave a Comment    Print it 

Guardian Tech Weekly podcast – my interview about MetaCert

I took part in the Guardian Tech Talk podcast show on August 2nd. I was interviewed about MetaCert and its contract with ICM Registry for the provision of adopting our platform to help protect families from content that will be hosted on the new .xxx Top Level Domain (TLD). This has been five years in the making – here’s a blog post where I talk about all of this back in March 2007 – it explains how our method of labeling content works, in plain English.

This was my first podcast since removing myself from social media and giving up all the speaking gigs and networking events over two years ago, so I could focus on the launch of MetaCert.

The overview of the entire show:

“The Motion Picture Association of America’s man in Europe discusses its court action which will force BT to block access to Newzbin. Why this particular site, why BT, and will ISPs become judge and jury on content? Also Chrome is now the UK’s second most popular browser, and Paul Walsh of MetaCert on why labelling xxx domains will be important, and how it can be done.”

You can listen to the podcast on the Guardian website. My interview starts 21mins 45 seconds into the show.

Here’s a transcript of the interview

Interviewer: Aleks Krotoski

Aleks Krotoski: Later this year ICM Registry will begin selling new domain names that end with the web address .xxx. Most of us can imagine what kind of content will be published on the sites with this registration, but Paul Walsh, you’ve been with us the entire program, and your company believes that it’s still important to classify it.

MetaCert has entered a contract with ICM Registry to certify each XXX site, why?

Paul Walsh:

So that people can find out more information about the content and its suitability, and then make up their own mind as to whether or not they should visit that website. So, we classify each XXX website as well as other adult sites that reside on other TLDs, like .com and .org. We have a browser add-on that you can install for Firefox to begin with, with our goal being integrated with the mainstream browsers. That is, mainstream browsers take a feed from our database so they can provide a better quality search experience.

And what happens is in Google search results you get a little tiny icon beside each search result which tells you more information about the content and its suitability. So if you’re in Work for example, you’ll know whether or not it’s safe to open up that link. Or, if you’re sitting with a five year old, or you just simply want to exclude that information and block it for your children. It’s not about censoring the web, and it’s not about doing anything on behalf of people; we don’t decide what’s appropriate. What we do is provide more information about the content and its suitability.

And ICM in its contract with ICANN, is obliged to label every website, so we help ICM live up to that agreement.

Aleks Krotoski: What proportion do you imagine of .xxx will be unsuitable and what proportion do you think will be perfectly masoned?

Paul Walsh:

I imagine the vast majority of it will be unsuitable for children because the whole point of having the .xxx domain is actually to host adult content. And by adult content I mean specifically sexually explicit content. Companies like Disney for example are likely to buy the domain, but it will be redirected on the DNS, so it will be protected. And so they won’t host obviously, child-friendly content.

Charles Arthur: I’ve got to say, my mind is boggling at what will the icons be? And also who’s going to go through and do it.

Aleks Krotoski: Yeah, who’s going to then sort of squeeze out your brains for everything that’s gone behind your eyes after you’ve gone through all the content.

Paul Walsh:

Well, we don’t have to go through the content. We’ve actually built some cool technology. We’ve got a spidering technology which crawls adult websites and it labels automatically based on the content on those websites, and it also then pulls in all of the outbound links. So, if an adult site links to another adult site, it gets pulled into our system and it’s automatically labeled based on the information contained in the title description, the metadata and so on.

Aleks Krotoski: How can this spidering technology identify the explicitness of the photograph for example? You know, what are you looking for — tag words, the classic example is you know, something about breast cancer could potentially be siphoned out if you’ve got some kind of automated system because it’s got the word breast in it and this country has a fixation with breasts.

Paul Walsh:

Great question, Aleks, and so this product is not an age appropriate product; that’s a product we will launch in the future, which will help you determine whether it’s appropriate in accordance to gambling, and violence and nudity, for example. This is a very binary, black and white decision. This really is adult content.

And how our technology deciphers between the two is the string search. We wouldn’t just label a website that had the word sex, for example, it would have to be in the same sentence that contains other words. There’s a big difference between the content on a website that talks about pornography and a website that contains pornography.

And I guess the one thing to stress about all this is that we’re not doing anything fantastically new in concept, but the technology is quite new and actually pioneering. ICRA, the Internet Content Rating Association, has been around since 1994 and they introduced a system called PICS, which is still in use by IE9 today in content advisor.

But unfortunately, it never got any traction. It never saw adoption for a number of reasons: 1) it left the responsibility up to the website owners to self label their website. For me, self regulation by itself doesn’t work, and enforcing certification doesn’t work. I think when you offer the combination of those choices it can work. 2) PICS only allowed you to make a claim about an entire website – it wasn’t possible to make a claim about an individual webpage or exclude certain webpages. With POWDER, MetaCert’s method of labeling content, it’s possible to label individual webpages. And then 3) ICRA didn’t productize PICS for each country; they were really generic in what they allowed website owners to do because what’s appropriate for an 18 year old in Germany is different in the UK.

So our method of classification that we created to get it scale; we put it through the WC3 and now it’s a ratified standard and replaced PICs.

Charles Arthur: So do you think the sites are gonna have a sort of struggle with each other trying to get ‘we wanna get more, we wanna be classified as having everything, all human life is there and then a bit more’?

Paul Walsh:

Yeah, I think if every website owner wanted to do that it’ll be great because that will mean that we’ll turn the company into a billion dollar business sooner than I expected. I think these assertions…

Charles Arthur: Do you charge per icon then or how do you do it?

Paul Walsh:

We’ve launched with the Family Safety product just simply to get adoption because we will have the biggest database of its kind worldwide in about three months, and is therefore more likely to be adopted by the mainstream browsers with the implementation of a new parental control. Think of it as our trojan horse into this space. And we’re in discussion with some of the mainstream browsers regarding this. We’re not interested in having add-ons like McAfee SiteAdvisor and AVG LinkScanner for long – that’s not scalable across the entire web.

What we want to do is build the add-ons to get the word out on the street and then get the mainstream browsers to implement better parental controls.

So, to answer your question Charles, the type of assertions that we help website owners or we will build products for in the coming months, are a seal for privacy, so that website owners can demonstrate their commitment and conformance to a code of conduct for privacy, so consumers will have greater confidence in going to that website because they know that they follow three best practices with their personal information… a seal for identity, which means that this website really belongs to this company or this person, so when you land on the webpage you get something like a green URL bar. This helps to combat phishing.

And there are a number of other products. One in particular which might be of interest to you actually… wouldn’t it be cool if in Google search results you could see which websites offer free content products and services… you could then specify free sites and block sites that charge for stuff so your kids can’t spend on your credit card.

Jemima Kiss: Well, that kind of leads on to what I was gonna ask, which is from a point of view of a mainstream web consumer, what’s this gonna look like? How’s it gonna change the web experience so when they go to the search engine and they do a search, and it might for whatever reason inadvertently pickup inappropriate content, how’s that going to look to them? And if they did go to that site how are they gonna, what’s that experience going to be like?

Paul Walsh:

Well, right now with the Firefox add-on, when you do a search on Google, and there are no adult content websites in the searches, you don’t see anything different, it’s exactly the same. If however there’s a site in the list that has been labeled by MetaCert, you get a tiny orange icon with three Xs on it that symbolizes adult content. That’s likely to change to something of a more positive reinforcement image, like a family type icon.

Aleks Krotoski: And also perhaps not something so, Netherland-ish…Dutch.

Paul Walsh:

You’re absolutely right.

Aleks Krotoski: Thank you very much, that’s what I was looking for.

Paul Walsh:

And when you hover over the icon you get a little pop-up that says this website may contain adult content, so you know that it may contain adult content. And then when you click on the MetaCert icon in the toolbar and go into the settings, you can enable the password; which then means if you click on the link you get a message that says this website has been restricted.

And you may want that for two reasons: 1) you don’t want your children to access those websites; and 2) if you’re in work and you click on a link in an email, it launches your browser before it hits the webpage that comes up with the message saying this website is restricted because it may contain adult content.

Aleks Krotoski: It’s going to become a badge of honor for websites.

Jemima Kissr: Yeah, what is the appeals process say if a website isn’t, doesn’t in fact fall within your…

Paul Walsh:

Great question. We also have a website dedicated to where the web community, we launched it yesterday, can submit websites that they deem adult orientated. Those sites go into a review queue to make sure they’re adult related, and then label them.

Charles Arthur: How many sites are there up there?

Paul Walsh:

Well, there, I believe there’s something in the range of…

Charles Arthur: XXX.

Paul Walsh:

Oh, XXX, there are over…they go on sale at the end of the year, and the sunrise period happens over the next couple of months. There are over 650,000 unique pre-ordered domains currently. So it’s a significant number, but that’s even a small number in comparison to the number of .com and .org websites we intend to label. I mean, MetaCert’s system is constantly labeling sites every minute of the day.

But Aleks, to go back to your question we will also have a webpage dedicated to where site owners can dispute a site that’s been labeled incorrectly, and then we can review that and act accordingly.

Charles Arthur: Do you think just with the XXX thing, do you think there’s a possibility that people will just think boy, here be dragons, that they’ll think it’s just gonna be such a den of weirdness that you’ve gotta really take your life in your hands going there (or your PC in your hands)?

Paul Walsh:

Well, actually if we look at it from an adult perspective, the XXX has a code of conduct. And the code of conduct includes we enforce the labeling of every website, this is not an opt-in service; this is automated and it’s compulsory. Secondly, the identity of every website owner has been verified by an independent third party. And thirdly, McAfee will scan every website automatically to make sure that they’re malware-free. So there’s that code of conduct, which means if you’re inclined to visit those types of websites, the chances are you may want to visit a .XXX website before any other website because you know that you’re less likely to have malware downloaded to your computer, and you know that they’re actually having their website labeled so people can exclude that kind of content if they don’t think it’s appropriate for them or their family.

Aleks Krotoski: What are those guidelines? What does fall within the appropriate and inappropriate?

Paul Walsh:

Actually, we don’t decide what’s appropriate and what’s not appropriate. What we’ve done is created a very specific description to describe sexually explicit content. And it’s very binary. When you go to an explicitly sexual site you know it’s likely you don’t want your children visiting that website.

Aleks Krotoski: Dare I ask what it is?

Paul Walsh:

I would need to read it out, but it would contain nudity in a way that’s meant to arouse the person viewing it. So the description is very specific in that there’s a difference between nudity in an educational way… you might have some nude pictures on the Guardian; they’re not going to be labeled as sexually explicit content because the Guardian would never host such content.

Jemima Kiss: Yes, it’s going to be art…

Paul Walsh:

Or art.

Aleks Krotoski: Ah, suddenly I just went somewhere sort of down, Peter Stringfellow…

Charles Arthur: I’m picturing the Chippendales.

Aleks Krotoski: Honestly, I was headed down Peter Stringfellow. Who was it, he said that strip clubs…no, not strip clubs… He said that table dancing bars are not inappropriate because sexual arousal is only a byproduct, it’s not the intention of the site. I mean I can see the semantic web is problematic because it doesn’t actually deal with human semantics, it deals with computer semantics, and demands that builders of a website actually put this metadata on steroids into the content. And so, to create a binary one or zero for the spider and for the system is…you’re gonna get a lot of false hits aren’t you?

Paul Walsh:

I don’t think so because what you and I, and Charles and Jemima would deem as sexually explicit content I can almost guarantee you we would all agree; and then if I showed you something that contained some sexual content that wasn’t explicit, and wasn’t intended to arouse the viewer, I’m pretty sure we’d all agree on what that is as well. So, we don’t, again, we don’t tell people what’s appropriate or inappropriate.

But if you look at the definition of pornography or sexually explicit content it’s very vague even on Wikipedia, so what we did is…

Charles Arthur: Excuse me a moment, Wikipedia!

Paul Walsh:

Yes, even Wikipedia (snigger)… So we looked at sites like YouTube and so on that looked at that kind of content.

Aleks Krotoski: Well, you can find out even more and wow, do I have a lot of questions that I’d still like to ask, at http://metacert.com.

Listen to the podcast here

Download the Family Safety Add-on now

If you believe you will have a hard time writing all these sections on your own, don’t take chances. Just come to Essaystore.org and buy essay online.

There are currently no comments on this post 
 Leave a Comment    Print it 

Adult sites to adopt cloud-based family safety labels from MetaCert

We issued our first press release today. I’ve made small edits so it reads a little more like a blog post.

ICM Registry, which plans to start selling .xxx domain names later this year, has entered in to a multi-million dollar deal with MetaCert, to label every website issued under .xxx.

Consumers expect to find out more information about products in a supermarket by looking at the label on the back. In the same way, when consumers search the Web, they want to find out more information about content and its suitability before visiting a website. MetaCert makes this possible using a pioneering cloud-based labeling technology and certification products. MetaCert will offer products that help businesses display information about their malware scanning practices, how they treat consumers’ privacy, whether they offer free or paid-for content and more. Consumers can then decide which sites to visit based on this extra information inside search results.

More on the MetaCert blog.

There are currently no comments on this post 
 Leave a Comment    Print it 

Sheetal Mehta Walsh speaks at TEDx


My wife Sheetal is speaking at TEDx about “Microfinance in developing and developed countries”. What more is there to say – she’s awesome and definately the best speaker I have ever seen – about a million times better than me!

Check out her Microfinance organization Shanti Microfinance – one of the few that is truely non-profit and transparent about fees and expenses. Most microfinance organizations are either for-profit (which is ok as long as they don’t over charge with high interest rates!), whilst others pass on currency fluctuations onto the poor entrepreneurs (hidden fees).

Sheetal is also a member of the Board at MetaCert – what an asset!

There are currently no comments on this post 
 Leave a Comment    Print it 

More than 500,000 .xxx domains pre-ordered

This is a worthwhile update: ICM Registry has seen a massive hockey curve in .xxx domains being pre-ordered. It’s now over half a million! They won’t go on sale to the general public for another 4 or 5 months.

How about trying to estimate how many will actually sell within the first week. I’m guessing 1million. What do you think?

Check out the number of registrations

There are currently no comments on this post 
 Leave a Comment    Print it