Chrome extension that blocks over half a billion pages of pornography

Today MetaCert released the first build of its family safe browser extension for Chrome. It’s an early release, but it rocks! It blocks more than half a billion pages of pornography. That’s more than any other software application on the market.

Unlike Google SafeSearch and other applications, MetaSurf doesn’t block sites that it shouldn’t by using outdated methods such as keyword checking. It only blocks pages that have been indexed by MetaCert - with more pages being indexed every day. Check out the live counter at http://metasurf.net

Check out the extension and leave a great rating score if you like it.

Please be aware that it is impossible to stop users from disabling Chrome extensions. So we highly recommend using this extension if you want to block pornography for yourself, or for young children who are not likely to change the settings on your browser when your back is turned.

Download the Chrome extension now!

There are currently no comments on this post 
  
 Leave a Comment    Print it  Share it
 

Why Google SafeSearch isn’t the answer for family safety

googlesafesearch1

When you use Google SafeSearch for family safety you’re actually blocking more sites than you would hope, just like the one below.

foodporndaily

Do you think a site displaying food should be blocked by Google SafeSearch? Yes if you’re a vegetarian I guess, but not if you’re hoping to block pornography to help protect your family from unsuitable content without restricting access to the rest of the Web.

As I was testing MetaSurf, our family safe Firefox extension, I found a great example to demonstrate why keyword checking adopted by family safety controls like Google SafeSearch, are based on outdated methods that haven’t been improved since the 90’s.

If you set Google SafeSearch preferences to ’strict’, it blocks websites that do not contain pornography, just like almost every other family safety control on the market. FoodPornDaily is blocked, yet it’s a website displaying pictures of food. They’re a little naive for using the term porn in the domain and site name, but still, it’s an example to demonstrate how keyword matching does not work. And it doesn’t stop there, SafeSearch also blocks Wikipedia and every other website on the web that ‘talks’ about porn. It even blocks websites that educate people about the dangers behind accessing pornography. And if I used the term porn in this post title, it would block this too. These family safety methods are old and need to be revised. What we need is a simple ‘opt-in / opt-out’ feature that allows parents to block sites that contain pornography, without blocking those that talk about it.

Interestingly, Microsoft Internet Explore has been using PICS since the mid 90’s - the old W3C standard that was replaced by our method of labeling content in 2009. It’s even technically impossible to label a site with PICS today and yet, IE still uses it as part of Content Advisor. And it is estimated that there are fewer than 15,000 websites with PICS.

The AVG family safe browser application for the iPad automatically blocks the search terms ‘hardcore’, even though there isn’t one search result on the first page Google that links to a site containing pornography - blocking perfectly safe radio stations etc. Keyword checking simple does not work.

To help improve family safety online, I started a new venture called MetaCert, where we have created the largest data set of over 588 million pages that contain sexually explicit content. And our system is indexing millions more every week. If Apple, Google, Mozilla, Opera and Microsoft would like to compliment their existing family safety controls with MetaCert’s dataset, we will happily give them the entire data for free, along with regular updates as we index more pages every day. We’re in advanced talks with at least one of these corporations so we must be doing something right.

Did I mention that we’re offering this data to the search engines and browsers for free?

There are currently 2 comments on this post 
  
 Leave a Comment    Print it  Share it
 

Why I don’t think you need a coder as a cofounder

I read a post on TechCrunch today where the author claims that you can’t start a new company unless you can write code. I disagree.

I started my tech career as a computer operator at a bank and later worked at AOL during the mid 90’s where I built my first website in 1996 as the first Technical Account Manager in Europe - my team helped to launch technologies and clients such as AIM, 56K modem speed, Internet Radio, Games, integrate browsers and more. I also built some very complex applications using one of the first ecollaborative technologies in a RAD environment when ecollaboration was a new term in 1999.  Yet, I write this post as a non-coding founder as I don’t have the ability to write a single line of code that’s meaningful to our company. I designed this blog by editing the CSS but that’s the extent of my code writing (exactly, that’s not writing code). Since then, my career has led me down a fantastic path, made up of both technical and non-technical roles across the Internet and mobile industries and later, the mobile web. I’m one of the seven original founders of the W3C Mobile Web Initiative and helped to write some of the compliance specification, yet I don’t posses the ability to write the code for a site that would work on both desktop and mobile browsers.

TechCrunch and many bloggers and indeed investors, believe that you must have a cofounder who can write code. This isn’t true. However, as a founder, you must posses the following:

  1. Ability to hire the best coder possible
  2. Ability to motivate the coder so they can work to the best of their ability
  3. Ability to ensure that the coder is motivated and working in a comfortable environment
  4. Ability to measure the performance of the coder, helping them to understand and take advantage of their strengths and feel ok telling you their weaknesses so you can support them
  5. Have a backup plan to ensure that another coder can take over should the original coder leave the project at a moments notice to ensure continuity of product development
  6. Ability to hire the best coder possible to take over when point 5 comes into play

My most recent experience with MetaCert is relevant too as we are closing a series A round and our investors see the strength of the team, yet the founder doesn’t have the ability to write code. At MetaCert our main coder Kamrul, sadly left us after 5 years of R&D. Everything was backed up and well documented. As a result, there was minimal disruption to the development of our crawling and labeling platform when we hired Paraschos, another awesome coder. What happens if Paraschos leaves? Hopefully he won’t. But if he does, everything is well documented so we would hire another coder to take over. After writing this post I will revisit point 3 to ‘help’ ensure Parschos stays with us until we’ve managed to launch our kids browser for the iPad and beyond.

Giving a title of cofounder to a coder just because you think it’s necessary is all about ego. A non-founding coder run over by a bus as TechCrunch puts it, is equally damaging to a company than if the coder was a founder - it makes absolutely no difference.

Note: if you don’t notice at least one typo you’ll know I’ve hired a ghost writer :)

There are currently no comments on this post 
  
 Leave a Comment    Print it  Share it
 

Why we dont need a .data TLD. And some insight to MetaCert’s mission

I was asked today by Lisa Green from Common Crawl, what I thought of a blog post written by Stephen Wolfram, where he talks about the possibility of creating a .data TLD for

highlighting the exposure of data on the internet—and providing added impetus for organizations to expose data in a way that can efficiently be found and accessed.

I’ve been working on trying to solve a simple problem for the past six years; Google and other search engines don’t provide enough information about the content or its suitability, before visiting each site. To this end, I’ve been helping to create a new technology and methodology that exposes additional information that consumers will find useful. Couple this with MetaCert’s partnership with ICM Registry for the provision of labeling every .XXX domain and my early involvement in the W3C Semantic Web Education and Outreach Programme and you get an interesting mix of feelings and emotions.

I replied to Lisa’s email, but I was so compelled by the subject that I thought I’d write a post about it. This post doesn’t express the opinion of Lisa or Common Crawl in any way.

When reading Stephen’s ’s post, I couldn’t help but feel he’s trying to reinvent a more complicated concept than the Semantic Web and then making it even more complicated by adding in a new gTLD to the mix. If the Semantic Web hasn’t seen mass adoption with the backing of the W3C over the past 15+ years, what hope does anyone have in creating a new complicated standard for trying to achieve the same thing with a new TLD. If he was talking about a slick user interface for “consumers” to access such information more easily, without knowing/caring for the jargon, he’d be onto something. In fact, he’d do exactly what MetaCert is aiming to achieve. When I was Chair of the British Interactive Media Association for three years I rarely met a designer or developer who understood the purpose of the Semantic Web - let alone what RDF is, or any of that stuff about metadata. All they care about is making sure current/existing search engines expose their customers’ websites.

Why create a new gTLD for the data that lies beneath websites? What would the domain look like? data.data, awesomecontent.data OR, is the idea to sell a .data domain for each site that wants to expose the metadata that lies beneath - I certainly hope not. In the end it would cost about $1m for the new gTLD application. $180k is the base application fee but as any TLD expert will tell you, that’s likely to jump to a million bucks easily with all the legal fees etc. Stephen is much better investing that money in MetaCert :-) - which has a six year lead, its method of labeling content is a W3C Full Recommendation, some partnerships that will help enable adoption, but more importantly, focus on what consumers need and sometimes want. I say sometimes want because they don’t always know what they need or want - that’s why we need to innovate.

MetaCert’s mission is to become the IMDB.com for the Web - providing consumers with more information about the content and its suitability before visiting a website. The first step was to help instigate the creation of a new standard for labeling content - this doesn’t guarantee adoption, but it certainly helps - and that piece of the puzzle alone took four and a half years.

The team is now building data sets, but only so tools can be built to make use of them - data alone is worthless unless it can be interpreted and easily consumed. The first data set we created is for the benefit of parents - they can now better protect their families from sexually explicit content. Adults who wish to find that type of content, but avoid it at work also benefit.

We have the largest data set of sexually explicit content worldwide, with an index of over half a billion webpages. The data by itself is worthless. Does a mother/father know what a data set is? No. Most don’t know what a browser is, let alone a browser extension or a plug-in (they are different). So we are building family safety tools that are easy to use. We are also in the process of encouraging mainstream players to update their existing family safety controls with our data set - as browser extensions etc. aren’t scalable across the entire web. We will build data sets for other useful purposes as soon as we have provided a whole product for family safety.

Going back to Stephen’s post, the aspiration/goal is admirable and similar to mine - but they require very different implementations.

A little insight to what most consumers would like to know about websites before visiting them.

  • Is it child safe / safe to open at work or in front of my kids?
  • Is it secure?
  • Does it respect my personal information?
  • Does this site really belong to this company?
  • Is everything on this site free or do I need to pay for stuff?
  • What’s the track record of this company (a little more tricky)
  • Is this website accessible to me (can I increase the size of text for example)
    Etc.

I’m providing examples because it’s important to always keep the consumer in mind. How do they benefit? What difference will it make to their life? Protecting children online, making our parents feel more safe and secure, helping people with disabilities find accessible content, are all benefits.

There are currently 2 comments on this post 
  
 Leave a Comment    Print it  Share it
 

Guardian Tech Weekly podcast - my interview about MetaCert

I took part in the Guardian Tech Talk podcast show on August 2nd. I was interviewed about MetaCert and its contract with ICM Registry for the provision of adopting our platform to help protect families from content that will be hosted on the new .xxx Top Level Domain (TLD). This has been five years in the making - here’s a blog post where I talk about all of this back in March 2007 - it explains how our method of labeling content works, in plain English.

This was my first podcast since removing myself from social media and giving up all the speaking gigs and networking events over two years ago, so I could focus on the launch of MetaCert.

The overview of the entire show:

“The Motion Picture Association of America’s man in Europe discusses its court action which will force BT to block access to Newzbin. Why this particular site, why BT, and will ISPs become judge and jury on content? Also Chrome is now the UK’s second most popular browser, and Paul Walsh of MetaCert on why labelling xxx domains will be important, and how it can be done.”

You can listen to the podcast on the Guardian website. My interview starts 21mins 45 seconds into the show.

Here’s a transcript of the interview

Interviewer: Aleks Krotoski

Aleks Krotoski: Later this year ICM Registry will begin selling new domain names that end with the web address .xxx. Most of us can imagine what kind of content will be published on the sites with this registration, but Paul Walsh, you’ve been with us the entire program, and your company believes that it’s still important to classify it.

MetaCert has entered a contract with ICM Registry to certify each XXX site, why?

Paul Walsh:

So that people can find out more information about the content and its suitability, and then make up their own mind as to whether or not they should visit that website. So, we classify each XXX website as well as other adult sites that reside on other TLDs, like .com and .org. We have a browser add-on that you can install for Firefox to begin with, with our goal being integrated with the mainstream browsers. That is, mainstream browsers take a feed from our database so they can provide a better quality search experience.

And what happens is in Google search results you get a little tiny icon beside each search result which tells you more information about the content and its suitability. So if you’re in Work for example, you’ll know whether or not it’s safe to open up that link. Or, if you’re sitting with a five year old, or you just simply want to exclude that information and block it for your children. It’s not about censoring the web, and it’s not about doing anything on behalf of people; we don’t decide what’s appropriate. What we do is provide more information about the content and its suitability.

And ICM in its contract with ICANN, is obliged to label every website, so we help ICM live up to that agreement.

Aleks Krotoski: What proportion do you imagine of .xxx will be unsuitable and what proportion do you think will be perfectly masoned?

Paul Walsh:

I imagine the vast majority of it will be unsuitable for children because the whole point of having the .xxx domain is actually to host adult content. And by adult content I mean specifically sexually explicit content. Companies like Disney for example are likely to buy the domain, but it will be redirected on the DNS, so it will be protected. And so they won’t host obviously, child-friendly content.

Charles Arthur: I’ve got to say, my mind is boggling at what will the icons be? And also who’s going to go through and do it.

Aleks Krotoski: Yeah, who’s going to then sort of squeeze out your brains for everything that’s gone behind your eyes after you’ve gone through all the content.

Paul Walsh:

Well, we don’t have to go through the content. We’ve actually built some cool technology. We’ve got a spidering technology which crawls adult websites and it labels automatically based on the content on those websites, and it also then pulls in all of the outbound links. So, if an adult site links to another adult site, it gets pulled into our system and it’s automatically labeled based on the information contained in the title description, the metadata and so on.

Aleks Krotoski: How can this spidering technology identify the explicitness of the photograph for example? You know, what are you looking for — tag words, the classic example is you know, something about breast cancer could potentially be siphoned out if you’ve got some kind of automated system because it’s got the word breast in it and this country has a fixation with breasts.

Paul Walsh:

Great question, Aleks, and so this product is not an age appropriate product; that’s a product we will launch in the future, which will help you determine whether it’s appropriate in accordance to gambling, and violence and nudity, for example. This is a very binary, black and white decision. This really is adult content.

And how our technology deciphers between the two is the string search. We wouldn’t just label a website that had the word sex, for example, it would have to be in the same sentence that contains other words. There’s a big difference between the content on a website that talks about pornography and a website that contains pornography.

And I guess the one thing to stress about all this is that we’re not doing anything fantastically new in concept, but the technology is quite new and actually pioneering. ICRA, the Internet Content Rating Association, has been around since 1994 and they introduced a system called PICS, which is still in use by IE9 today in content advisor.

But unfortunately, it never got any traction. It never saw adoption for a number of reasons: 1) it left the responsibility up to the website owners to self label their website. For me, self regulation by itself doesn’t work, and enforcing certification doesn’t work. I think when you offer the combination of those choices it can work. 2) PICS only allowed you to make a claim about an entire website - it wasn’t possible to make a claim about an individual webpage or exclude certain webpages. With POWDER, MetaCert’s method of labeling content, it’s possible to label individual webpages. And then 3) ICRA didn’t productize PICS for each country; they were really generic in what they allowed website owners to do because what’s appropriate for an 18 year old in Germany is different in the UK.

So our method of classification that we created to get it scale; we put it through the WC3 and now it’s a ratified standard and replaced PICs.

Charles Arthur: So do you think the sites are gonna have a sort of struggle with each other trying to get ‘we wanna get more, we wanna be classified as having everything, all human life is there and then a bit more’?

Paul Walsh:

Yeah, I think if every website owner wanted to do that it’ll be great because that will mean that we’ll turn the company into a billion dollar business sooner than I expected. I think these assertions…

Charles Arthur: Do you charge per icon then or how do you do it?

Paul Walsh:

We’ve launched with the Family Safety product just simply to get adoption because we will have the biggest database of its kind worldwide in about three months, and is therefore more likely to be adopted by the mainstream browsers with the implementation of a new parental control. Think of it as our trojan horse into this space. And we’re in discussion with some of the mainstream browsers regarding this. We’re not interested in having add-ons like McAfee SiteAdvisor and AVG LinkScanner for long - that’s not scalable across the entire web.

What we want to do is build the add-ons to get the word out on the street and then get the mainstream browsers to implement better parental controls.

So, to answer your question Charles, the type of assertions that we help website owners or we will build products for in the coming months, are a seal for privacy, so that website owners can demonstrate their commitment and conformance to a code of conduct for privacy, so consumers will have greater confidence in going to that website because they know that they follow three best practices with their personal information… a seal for identity, which means that this website really belongs to this company or this person, so when you land on the webpage you get something like a green URL bar. This helps to combat phishing.

And there are a number of other products. One in particular which might be of interest to you actually… wouldn’t it be cool if in Google search results you could see which websites offer free content products and services… you could then specify free sites and block sites that charge for stuff so your kids can’t spend on your credit card.

Jemima Kiss: Well, that kind of leads on to what I was gonna ask, which is from a point of view of a mainstream web consumer, what’s this gonna look like? How’s it gonna change the web experience so when they go to the search engine and they do a search, and it might for whatever reason inadvertently pickup inappropriate content, how’s that going to look to them? And if they did go to that site how are they gonna, what’s that experience going to be like?

Paul Walsh:

Well, right now with the Firefox add-on, when you do a search on Google, and there are no adult content websites in the searches, you don’t see anything different, it’s exactly the same. If however there’s a site in the list that has been labeled by MetaCert, you get a tiny orange icon with three Xs on it that symbolizes adult content. That’s likely to change to something of a more positive reinforcement image, like a family type icon.

Aleks Krotoski: And also perhaps not something so, Netherland-ish…Dutch.

Paul Walsh:

You’re absolutely right.

Aleks Krotoski: Thank you very much, that’s what I was looking for.

Paul Walsh:

And when you hover over the icon you get a little pop-up that says this website may contain adult content, so you know that it may contain adult content. And then when you click on the MetaCert icon in the toolbar and go into the settings, you can enable the password; which then means if you click on the link you get a message that says this website has been restricted.

And you may want that for two reasons: 1) you don’t want your children to access those websites; and 2) if you’re in work and you click on a link in an email, it launches your browser before it hits the webpage that comes up with the message saying this website is restricted because it may contain adult content.

Aleks Krotoski: It’s going to become a badge of honor for websites.

Jemima Kissr: Yeah, what is the appeals process say if a website isn’t, doesn’t in fact fall within your…

Paul Walsh:

Great question. We also have a website dedicated to where the web community, we launched it yesterday, can submit websites that they deem adult orientated. Those sites go into a review queue to make sure they’re adult related, and then label them.

Charles Arthur: How many sites are there up there?

Paul Walsh:

Well, there, I believe there’s something in the range of…

Charles Arthur: XXX.

Paul Walsh:

Oh, XXX, there are over…they go on sale at the end of the year, and the sunrise period happens over the next couple of months. There are over 650,000 unique pre-ordered domains currently. So it’s a significant number, but that’s even a small number in comparison to the number of .com and .org websites we intend to label. I mean, MetaCert’s system is constantly labeling sites every minute of the day.

But Aleks, to go back to your question we will also have a webpage dedicated to where site owners can dispute a site that’s been labeled incorrectly, and then we can review that and act accordingly.

Charles Arthur: Do you think just with the XXX thing, do you think there’s a possibility that people will just think boy, here be dragons, that they’ll think it’s just gonna be such a den of weirdness that you’ve gotta really take your life in your hands going there (or your PC in your hands)?

Paul Walsh:

Well, actually if we look at it from an adult perspective, the XXX has a code of conduct. And the code of conduct includes we enforce the labeling of every website, this is not an opt-in service; this is automated and it’s compulsory. Secondly, the identity of every website owner has been verified by an independent third party. And thirdly, McAfee will scan every website automatically to make sure that they’re malware-free. So there’s that code of conduct, which means if you’re inclined to visit those types of websites, the chances are you may want to visit a .XXX website before any other website because you know that you’re less likely to have malware downloaded to your computer, and you know that they’re actually having their website labeled so people can exclude that kind of content if they don’t think it’s appropriate for them or their family.

Aleks Krotoski: What are those guidelines? What does fall within the appropriate and inappropriate?

Paul Walsh:

Actually, we don’t decide what’s appropriate and what’s not appropriate. What we’ve done is created a very specific description to describe sexually explicit content. And it’s very binary. When you go to an explicitly sexual site you know it’s likely you don’t want your children visiting that website.

Aleks Krotoski: Dare I ask what it is?

Paul Walsh:

I would need to read it out, but it would contain nudity in a way that’s meant to arouse the person viewing it. So the description is very specific in that there’s a difference between nudity in an educational way… you might have some nude pictures on the Guardian; they’re not going to be labeled as sexually explicit content because the Guardian would never host such content.

Jemima Kiss: Yes, it’s going to be art…

Paul Walsh:

Or art.

Aleks Krotoski: Ah, suddenly I just went somewhere sort of down, Peter Stringfellow…

Charles Arthur: I’m picturing the Chippendales.

Aleks Krotoski: Honestly, I was headed down Peter Stringfellow. Who was it, he said that strip clubs…no, not strip clubs… He said that table dancing bars are not inappropriate because sexual arousal is only a byproduct, it’s not the intention of the site. I mean I can see the semantic web is problematic because it doesn’t actually deal with human semantics, it deals with computer semantics, and demands that builders of a website actually put this metadata on steroids into the content. And so, to create a binary one or zero for the spider and for the system is…you’re gonna get a lot of false hits aren’t you?

Paul Walsh:

I don’t think so because what you and I, and Charles and Jemima would deem as sexually explicit content I can almost guarantee you we would all agree; and then if I showed you something that contained some sexual content that wasn’t explicit, and wasn’t intended to arouse the viewer, I’m pretty sure we’d all agree on what that is as well. So, we don’t, again, we don’t tell people what’s appropriate or inappropriate.

But if you look at the definition of pornography or sexually explicit content it’s very vague even on Wikipedia, so what we did is…

Charles Arthur: Excuse me a moment, Wikipedia!

Paul Walsh:

Yes, even Wikipedia (snigger)… So we looked at sites like YouTube and so on that looked at that kind of content.

Aleks Krotoski: Well, you can find out even more and wow, do I have a lot of questions that I’d still like to ask, at http://metacert.com.

Listen to the podcast here

Download the Family Safety Add-on now

There are currently no comments on this post 
  
 Leave a Comment    Print it  Share it
 
Close
E-mail It
Kamrul.co.uk Webhelius