Meet the 2014 Founders: MyPrepApp. Motivation, Not Information.

As we continue to introduce the Founders from StartupYard 2014 and their products, we bring you Vaclav Formánek, Founder and CEO of Educasoft, maker of MyPrepApp, a motivational planning device for exam preparation. 
Vaclav Formanek, getting passionate about education.

Vaclav Formanek, getting passionate about education.

Vasek, tell us about MyPrepApp, and Educasoft.

MyPrepApp is a mobile and web application that helps students to achieve on their important exams. It’s a way for students to avoid the stress of major exams without avoiding the actual studying: it gives you a reason to study, and it makes the process fun, and, we hope, a lot less painful.

Did you have trouble studying as a kid?

I was a kind of nerd as a kid. I started to have some problems with studying at high school and university, as I found there were much more interesting things to do than studying.

I started this project with my best friend Ondřej Menčel (Ondřej is CTO of Educasoft) more than two years ago. We loved playing games and we were fascinated with their motivational power. We were asking ourselves a question: if games are so cool that they motivate us to spend hours and hours solving problems in a virtual world, couldn’t we use some of their power to motivate ourselves to do real things, such as studying?

I was never a good studier. I guess that’s the typical experience, but it creates a lot of stress. I couldn’t ever decided what the important stuff was, and how to prioritize when I was studying. So I would procrastinate, and end up cramming for the exam at the last second out of panic. Everybody’s had that experience right? Studying was boring and nothing motivated me to start early. Have you ever had that dream where you show up for a test, but you aren’t prepared, and you don’t know what to do? That’s our inspiration.

MyPrepApp is molded out of our personal experiences. It creates a tailored study plan for exam preparation, and uses game rewards and support of friends to enhance students´ motivation to follow the plan and reach their study goals.

When I am saying “we” I am talking about our company Educasoft. Educasoft is a team of people who want to provide students a better way to prepare for exams.

Ondrej and Vasek taking a break on the TechSquare swing set.

Ondrej and Vasek taking a break on the TechSquare swing set.

Your team has already launched and generated revenue with a similar service in the Czech Republic: Hrave.cz. How did Hrave become MyPrepApp?

Well, we launched “Maturita hravě,” our first product, in preparation for the Czech exit examination, just a few days before the exam actually took place. So it was really a baptism by fire. It was just a last minute thing, so you can see a pattern here!

But, we were really surprised by the results. Within the first week, more than 5000 students tried out Hrave, and feedback was mostly very positive. When we were thinking what to do next, we decided to focus on what was crucial for passing the exam, and what’s really missing from existing products for exam preparation: tailored study plans and enhanced motivation to study. User feedback showed that the main problem with studying wasn’t informational, but motivational. This became the basis of the MyPrepApp model.

 

The education technology field is crowded. What makes MyPrepApp a potential stand-out in your thinking?

We take a different approach towards studying for exams. We see achieving on exams as the same type of goal as, for example, being able to run a marathon or losing 10 kilos, and we think we can use similar methods to help people achieve these goals. That´s why we are inspired by successful fitness and running apps such as Endomondo.

gamifikace plan

We are focused on students with low self-motivation. Students who need a study plan and who need to be intensively pushed to follow it. We think that this group of students has been ignored by existing exam preparation products. Most of these, like Kaplan Test Prep, Magoosh, or BenchPrep just assume the student is motivated from the outset… but we know that isn’t the case.

Our goal is to be the best preparation app for those students – the ones who need someone to tell them what to study and motivate them to do so.

What are the technical and business challenges you think you’re going to face in the next year or so?

The big technical challenge for us is creation of the study plan. We take it very seriously, as by recommending what students should study, we become partly responsible for them and their results. To be able to create a good study plan, we need to combine knowledge from many different areas – from the perfect knowledge of tests to the psychology of learning.

As for business challenges the biggest one will be to entry the US market. I think we will need a business partner to do it in the most effective way.

What strategy are you pursuing for bringing the platform to a global market? How will you secure and grow a strong content network?

Tom2

We have been developing the platform itself to be content independent, so it can be used for most of standardized exams, no matter which system they are for, in Czech republic, Poland or the US. While the exam systems are very different between different countries, our approach can remain constant.

As it is quite easy for content creators to use our platform, we can choose the best strategy for getting the relevant educational content for different countries and exams. Similarly, we can choose the best strategy to market MyPrepApp in different countries. We are now in the process of deciding for which countries to find strategic partners, and in which we can branch out on our own.

Which of the mentors at StartupYard have had the most profound impact on Educasoft during the past few months? How has the accelerator been for your team?

Generally the mentor sessions have helped us a lot to make our plans more precise, and prioritize the next steps. Roman Smola (Founder of Glogster EDU) had amazing knowledge about how to be successful in the US market with educational products. Vit Horky (CEO of Brand Embassy) has a really interesting approach to business development, that we learned a lot from.

Unfortunately I was the only team member who could atend most of the program during the first month of the accelerator as the rest of the team had to stay home working on the app so we could launch it as soon as possible. Though we find the accelerator very useful.

[ssba]

Meet the 2014 Founders: Gjirafa, Albania/Kosovo’s answer to Google

In our continuing series, we are introducing the StartupYard 2014 teams in individual interviews with their founders and key members at the accelerator. Here we introduce Gjirafa, in the words of CEO and Founder Mergim Cahani, of Kosovo. 

 

Mergim, how would you describe Gjirafa in a few words?

It’s an awesome animal with a long neck :laughs:.

Gjirafa is a full-text web search engine and a news aggregator specialized in the Albanian language. Gjirafa will bring relevant information that will be easy accessible to over 12 million Albanian speaking people worldwide.

So it’s Google For Albanian Speakers. Isn’t That Job Already Taken (by Google)?

You could say the same thing about Seznam or Yandex (the Russian search giant), but they’ve thrived in competition with Google. That’s a great model for us moving forward.  Competition between Seznam and Google have brought better results for consumers in the Czech Republic. Google doesn’t own the internet, and it shouldn’t.

And no, we aren’t Google. We have something that Google does not have. Gjirafa has access to local data, understands the market, and has been developing technology for full-text search in Albanian language. That’s something no one else has ever done, including Google.

Albanian stands alone as a language with no relatives.

Albanian stands alone as a language with no relatives.

Gjirafa is turning quite a few heads with our mentors at StartupYard. Why do you think that is?

Our team is built to impress, with a very strong business and academic background. Three founders have a combined 30+ years of experience, one previous successful startup, four masters degrees and one PhD. The advisory board features prominent figures in web search and management, Prof. Torsten Suel and Prof. Jay Nathan respectively.

We are very happy to be getting so much positive attention, but important to note is that mentors’ inputs and constructive feedback is shaping our product and company further. From day one at StartupYard our value proposition started to get better and better thanks to mentors’ feedback. The reason why most mentors and investors are interested, we think, is that our project has the prerequisites to make it promising: a strong team, an excellent market potential, and the technology – specifically our differentiating product features.

Mergim Cahani: Founder and CEO of Gjirafa

Mergim Cahani: Founder and CEO of Gjirafa

What brought you to StartupYard? What have been the benefits for you, so far?

I am certain that StartupYard is de facto the best accelerator that our team and project could have picked. In fact it is the only accelerator that we wanted to be part of (within the context of this project). It has just about all the ingredients of other accelerators, including the ones from Silicon Valley, and then some – that directly gives us better opportunities and increases our chances of success.

Mentors, investors, angels and VC’s, involved with StartupYard can more easily comprehend the potential of our project at our targeted market than other investors from other geographic areas. There are great similar success stories in the Czech Republic, and some of these investors are involved directly in those projects (www.seznam.cz is one example). They understand our product, they recognize its potential, and have a clear idea what it takes to reach our goal. This way, they can provide feedback that is so vital to company success, and some have already shown interest to be part of this journey.

Where to start with benefits of StartupYard :laughs: We love Prague, StartupYard at TechSquare has an amazing working environment, great people, a lot of events, and, can’t forget,  great Czech beer. As far as accelerating our project growth, we have meet some industry leaders, Chairpersons, CEOs, and investors from world leading corporations, who really helped shape our product and increase our value proposition immensely. Also there are a lot of perks, to mentioned one: we are en route to becoming a BizSpark plus company (that is around $60,000 in azure credit that we were planning to spend). Last but not least, people who run StartupYard know their business- they have a proven track record and experience that was evident from day one.

left: Cedric Maloux, Director Startup Yard. Right: Mergim Cahani, Founder CEO, Gjirafa

left: Cedric Maloux, Director Startup Yard. Right: Mergim Cahani, Founder CEO, Gjirafa

What are your near-term goals for Gjirafa? What products and services will be part of the ecosystem at launch?

65911_222585047881671_1347153580_n

Our near-term goal is to launch within two months. We are planning to include a few “elect” services at the beginning. That means a full text search, news aggregation, a transport scheduler for Kosovo, Albania and Macedonia, weather widget, and Albanian web facts. All these services are one of a kind, as they currently do not exist anywhere. The obvious exception is text search, where Google is a player, but we think we can do a better job, as we are focused only on one language and one specific segment of the web. That’s worked for Seznam, and we think they’ve shown us the way to success against the Google Goliath.

How about your long term goals?

Our long term goal is to become the front page of the Albanian speaking web. To be synonymous with “Internet” in the Albanian mind. If you speak Albanian, when you open a browser, it will open on www.gjirafa.com. We will provide highly relevant services and ease of access to information that is geographically localized and based on the Albanian language. Gjirafa will be more than just a useful search engine, it will be everywhere for everything. I will not speak to specific services that we plan, but I can tell you that there is a full list on queue that we are prioritizing; each one of them more valuable than the next.

As a sneak peak, enabling e-commerce in Albania and Kosovo, at this moment, tops the list of our long-term goals. Replicating the platform to other Balkan peninsula countries, is also a viable option.

You’ve mentioned developing a unique search engine for the Albanian language. Can you tell us about the development process?

It was fun! :laughs: That may sound extremely nerdy, but I don’t mind. It was really fun.

Working on this from Kosovo was a different experience than the time I spent in the United States; where in my last job I worked in a typical corporate environment. Previous to that I was in Academia, and being able to work full time on a project that I loved, what can I say? It was thrilling.

I turned one bedroom of the house into an office (this startup was luxurious; no office garage)! I used a bit of my prior experience with developing large-scale full search engines, from my Masters program at NYU Poly School of Engineering, and the very valuable help of my mentor Prof. Torsten Suel, to create all the pieces needed for the Gjirafa engine; multi-threaded crawler, indexer, query processor, and a few things in between. I developed a prototype that was not the best out there, but it was good enough and I was happy with the outcome.

The biggest limitations at the beginning were hardware and bandwidth, plus latency, and occasionally an algorithmic problem that kept me up at night. Later, two friends joined me as co-founders, and now we are working on making the engine even bigger and better. One co-founder Ercan Canhasi, PhD, is working on the search engine, while the other co-founder, Diogjen Elshani, MS, is working on the business development side.

Why do you think competitors like Google haven’t focused on Albanian speakers,

Google hasn’t ignored the market completely. I think they’ll regret their absence.

The scalability of Google allows it to fit almost any market given enough data. But there are two problems here (1) currently there is not enough data for the Albanian language on the web, and (2) the Albanian language is one of the most lexically unique language in the world. Google can’t search something it doesn’t have; it can’t index information that currently does not exists on the web. As far as the language goes, Albanian is one of the a few languages that does not derive from another language; it is a branch on its own. Processing a language (intelligently), means some knowledge is needed for that language. Linguistic research in English, and for a lot of other languages, exists. There is almost no linguistic research for Albanian that applies in this context. We are currently researching and developing Albanian grammar and syntax for NLP.  We have done the groundbreaking work that will tie Albanian speakers together online, through their language.

Kosovo’s political situation has undoubtedly held back business development in the region. Do you see the situation as improved enough for the region to compete on a level with the rest of Europe?

It is true that the political situation in the region has set back development. But things have started to take a turn, and Kosovo and Albania are becoming emerging markets especially in technology development. Based on our web mining data, the Albanian web is still in the early stages of development, but it has doubled in the past year and it is continuing its growth rapidly. That might sound like not much, considering that the whole size of the web increases at the same rate, but the difference is that the Albanian web has been expanding its core economic value at a much greater rate than the average. It is developing, and that means there are enormous positive gains to be made across a huge range. The rest of Europe will not see its web experience improves by 200% in the next 2 years. Albania and Kosovo will see that kind of improvement.  This web infancy is one of the reasons why the market is not penetrated by global companies, which makes it a logical reason why our project represents a great opportunity right now.

unnamed

What’s your general strategy for marketing Gjirafa? Google has name recognition in search all over Europe. How can you compete with that position?

Our position is with the unique services that we provide for users that Google, and other competition, do not. People need information, and currently can not get it online, and we feel that this market has been left behind – but they will be able to find it on www.gjirafa.com. Also, we will provide a targeted platform for merchants that will enable them to reach their customers. That aspect of the online economy is completely absent in Albania/Kosovo. Can you imagine that? It’s 1999 in online advertising there. Imagine what that means for the future. Our marketing strategy is diverse and a combination of several channels. Without going into specifics, we have a few marketing strategies planned for direct and indirect marketing.

 

Gjirafa is planning to launch its full text search engine in July of this year. 
You can connect with Mergim via Linkedin. 

[ssba]

Irena Zatloukalova: Keep It Simple (For The Media)

StartupYard Mentor Irena Zatloukalova

StartupYard Mentor Irena Zatloukalova

Wednesday, startup teams from StartupYard spent the morning and most of the afternoon in PR training. PR and internal communications manager Irena Zatloukova,  of Seznam, grilled each of the teams for several hours, walking them through the experience of having to pitch their companies, answering uncomfortable or difficult media questions, and crafting and selling a narrative to the media. Here were some of the takeaways from the session:

Journalists are People Too

Irena Zatloukalova should know something about journalists. As head of PR for Seznam, she deals with all of kinds. The most important highlight of all of her experiences was this: journalists are people too. People know when they’re being treated fairly. They generally know when you’re lying, or when you’re not being completely honest. They know when they’re being used, and they resent it the same as anyone would. They also respond to positive inputs in all of the same ways that other people would: praise, trust, caring, and interest inspire journalists just as they inspire others.

Understanding Conflicting Motivations

IMG_0546

Irena and Cedric kicking off the workshop

Zatloukalova pegged the sometimes tense relations with journalists, especially among entrepreneurs, on the conflicting motivations that publications and their editors, and entrepreneurs have. As an entrepreneur or as a company, there’s a tendency to want to carefully craft a journalist’s take on your activities, and push a specific, self-serving narrative. At the same time, reporters have to justify, to their bosses and their readers, writing about a given company, or a given product. Often the interests of a journalist and a business are not perfectly aligned, and tension arises when a PR manager or a CEO is not able to accept those differences amicably- when the representatives of a company can’t respect the position a reporter is in. PR reps can form the destructive habit of “blacklisting” or cutting off disfavored reporters and publications for not toeing the company line, and they may also be tempted to distort the truth, or to lead journalists on with misleading intimations or false facts. This is a symptom of expectations that would be impossible to meet: that reporters be an apparatus of marketing, rather than a medium and means of communication.

Building a Story

 

IMG_0560

Team Evolso gives a mini press-conference

And to avoid these traps of poorly managed expectations and conflict, Zatloukalova talked about “building a story.” Story building is a way of approaching communication with media, that keeps in mind that media will always form its own conclusions based on the information provided, and the impressions of the journalists themselves. Thus, 3 elements are key to getting media to do what you need it to do, and Zatloukalova suggested that startupers ask themselves these three questions:

IMG_0548

Team Girafa in particular wants some of Seznam’s secret sauce

Is it News?

Is the story actually of interest? Is it something unique? Does it have import for the readers? Just because you want the media to talk about you, doesn’t mean they will. Many young companies can be tempted to see any information they give to the media as an enticing gift, when in fact they offer little of real substance or interest. It has to be news.

What are the Details?

This part is about curiosity. Facts make the story real, and they are the juiciest part of the story. Providing the media with facts makes the story real for them, and gives them something to present to their readers. Without statistics, exact figures, dates or percentages, your story’s context can be unclear. How important is this news to you? To your market? To the reader? To competitors? What do the numbers actually mean? The details lend credibility, and offer the media something they can use to justify their story as important, and meaningful. Without facts, there is no story.

Is This a Trend?

Finally, what does this piece of news say about something bigger than your company? Reporters love to find and tell stories that demonstrate a pattern or an emerging condition in the market, or in society in general, that has not been fully described before. If your product is beating a competitor that was thought unbeatable, this could be part of a new trend. If your users are interested in your product for a novel reason, that too could form the basis of a new and noteworthy change in the way things work. Trends can be small, restricted just to your market, or even to your own company, or they can be big; saying things about society, about your country, about the future, and about technology, art, and the economy.

Not Making Journalists Think

IMG_0556

Zatloukalova also stressed the “Art of the Soundbite,” or the unique framing of a particular narrative your company is pushing, which expresses itself well in just a few words. The object when addressing the media is to speak in terms that are *evocative* without being too specific or conditional. The more a journalist evaluates what you say based on its internal logic, rather than on his or her own biases and experiences, the better of you are. So make these arguments and viewpoints interesting and memorable.

She gave examples like Apple’s “The World’s Thinnest Notebook,” soundbite for the introduction of the Macbook Air, and Cedric Maloux, our director at StartupYard, added his favorite, also from Apple: “1000 Songs in Your Pocket.”

Don’t Describe, Evoke

IMG_0500

All the teams had an opportunity to grill and be grilled. No one was spared in this workshop.

Evocative soundbites are those that make a strong statement, which forms a clear image in the mind of the journalist, which he or she can pass on to a reader. This process is one of positioning, as well as promotion; Zatloukalova gave the example of Seznam itself: pointing out that Seznam doesn’t speak in terms of itself alone, but evokes the images that reporters are familiar with, to contextualize the company: “Seznam: the only company in Europe competing on a level with Google,” or simply “Seznam is the Google of the Czech Republic.” These sorts of statements are strong, can be backed up with facts, and are easily understood and repeated. The simpler a statement is, the greater a chance it has of finding itself repeated and used again. As an editor, Zatloukalova will often take the writing of a marketing copywriter or a fellow PR rep, and remove, to their great frustration, all of the adjectives from the piece. The point in this should be clear enough: what is important is not your opinion by itself, nor how you wish people to see things, but rather statements of fact that can be argued convincingly. You can tell someone that your app is wonderful and innovative, but why should they listen? People listen to surprising and unexpected statements- even statements they don’t necessarily agree with.

One of the CEOs at the workshop voiced a doubt about this strategy. “The Macbook Air wasn’t the thinnest notebook in the world. What happens when your claim is only arguable?” But Zatloukalova pointed out that arguments of that kind aren’t particularly bad, for an established company or for a new one. If the media is arguing over or critiquing your claims, you’re in control of the conversation at a basic level: they are already talking in terms of how you see yourself.

[ssba]

What “Mentor Driven” Means To Us

What an Accelerator is For

A journalist visiting TechSquare this week asked me an intriguing question. I say “intriguing,” because as it was coming from an outsider to this business, it demanded a single answer to a question that is not often taken by itself: “what is a tech accelerator really for?” That kind of question demands an answer that applies to all parties: to the investors, to the startups, and to the general public. What do we do that adds value to the world in which we live? The answer I arrived at was this one, and I think it covers all of that: “a startup accelerator helps to manage, facilitate, and encourage intelligent risk taking.”

As Techstars has explained about their own roots, the current mold of accelerators was formed in reaction to risk aversion. Angel investors and VCs were, from about 2002 onward, inflicting far too much pain on startups to prove their worth before securing seed investments, which probably led more than a few worthy startups to stall out for lack of access to funds. The tech crash in the early 2000s had soured many investors on the market, and introduced big barriers to entry. Imagine a world in which Facebook didn’t have the money to get to its millionth user that first summer. This was a real danger at the time. But today,  a service that has added half a million users in a period of several months would be unlikely to have that particular fear. The accelerator movement has been an important part of that shift away from risk aversion, to more intelligent risk taking.

What “Mentor Driven” Means

 

 Over the past month, our teams have met with nearly 40 mentors each. That’s 40 meetings with entrepreneurs, professionals from within their areas, and CEOs of companies that have been in the position that our founders are in now. There have been so many meetings, that many of the teams have had moments of frustration with the process. One of the CEOs told me last week: “They all ask me similar questions, and I haven’t had time to do the things they’re all telling me I should be doing.”

Yes, it can be frustrating, but we also view that feeling as somewhat positive. A founder of a young company who is very aware of the potential problems he is facing is more likely to take a realistic approach to solving those problems, instead of avoiding them. He may be tired of hearing the same concerns, but he will definitely find ways of addressing them- if only so that he doesn’t have to keep hearing about them. He knows where he stands, and where he needs to be when this process is done.

Bad habits and false assumptions, when untested too long, can ossify very quickly, and poison sound decision-making. The accelerator is the antidote to that problem, forcing founders to address their toughest challenges first, rather than wasting time and money working in a market they don’t understand well enough. Constant early contact with mentors breaks up patterns of thinking and working that will lead founders wrong.

It’s About Who the Mentors Are

“Mentor driven,” means that the first steps a startup takes are in consultation with people who want them to succeed. Most of our mentors are not investors, and most will probably not end up working directly with any of our founders later on, but they are people who care about spreading knowledge, knowing their industry well, and making valuable and useful connections with each other, and with new startup founders. While basically all accelerators are concerned with helping their teams raise money at some point, at demo day, or later on, the focus at StartupYard is on giving the company the strongest possible foundation as a means to that end, and to making the company a success in general. Knowing and understanding your own industry, how people talk and behave, and how they think, are really vital elements of that kind of success.

Startups are Not in Business to Raise Money

A lot of startups quickly start thinking that they are in the business of raising money. That’s a cycle that’s easy to fall into. The second an investor wants to talk money, a founder has to completely change how he or she is thinking about the business, and fit that thinking to the way the investor thinks. If founders have conversations with investors too early in their own development, both as business people and stewards of their own companies, they can easily be taken in by the investor’s agenda, which is different, on a basic level, from their own.

A founder should be interested in his or her users, in solving problems for the people that will use their products, and in forming a company that adds value to the world in which they live. A good product or service company needs these goals above and beyond profitability in order to shape its future and give it purpose.

But an investor is only interested in realizing gains on their investments. If 1 dollar today can gain 20 tomorrow, they will invest. And likewise, if making a company stop and completely reconfigure its own priorities in order to win investment can turn 1 million dollars today into 20 million dollars next year, investors will encourage that to happen. So having a company planted on ground solid enough not to be shaken by incoming investment is very important. A founder has to have a vision of his company in 5 years. An investor doesn’t buy that vision, just the part of it that has an upside potential. We need investors to make many startups work, but that doesn’t mean investors should run startups, or tell them what they want too early in their development.

Mentoring can be a cure for that illusion. Talking to people who have taken on investments and regretted it, as well as those who have done it well and made it work, is an experience of great value to someone who has never had a conversation about money that involved more than 3 zeros.

But most importantly, mentors remind founders that their businesses have to work, not just as investment vehicles, but as *real* businesses. As I said: an accelerator is about taking intelligent risks. Putting 3, or 6 or 12 months of your time into a company is in itself a risk. So why not make it an intelligent one?

[ssba]

Michal Illich: “Know your Competition.”

Michal Illich is a household name in the Czech Republic’s technology industry. Aside from developing the engine that originally powered Seznam, the king of search in the region, Illich has founded a raft of companies in the past 15 years. He’s a founder of StartupYard, as well as of Techsquare, the open tech workspace where StartupYard is based. He’s been mentoring our current startups, and we got him to weigh in on the state of the Czech Republic’s tech industry, and what it’s like to mentor new founders.
 
Michal, first things first: we hear you have a Tesla. Were you the first Tesla owner in Prague? How do you like it?

As far as I know, a few (up to 5) owners received their Tesla in the same week as I did. I might be the first because I opted for the earliest possible date. It’s a great car – beautiful, very powerful (4.2 seconds to 100 km/h) and still practical (5 seats, 2 trunks).

You’re one of the founders of TechSquare (homebase for StartupYard), and a founder and investor in StartupYard itself. What got you interested in bringing new startups to Prague?
Well, as I’m one the first generation of Czech people who made some money from their internet projects, I thought it’ll be nice to give something back.
Czech and Central European investors are known for being conservative. Do you think that’s true, and if so, what unique challenges does that present startups here?
I’m not really sure if we are conservative. Most investors I know are realistic or optimistic about Czech startups. I don’t think that the american way of throwing a lot of money into startups and hoping that 1% will became a billion dollar company would work here. We are slower but longterm results of Czech IT companies are quite solid.
You’ve been mentoring the teams at StartupYard since the beginning. What do you find difficult about mentoring at this stage in these companies’ development? What about it is rewarding for you?
As Niels Bohr said, it’s hard to make predictions, especially about the future. No one – even the best mentors – can predict the success of any particular startup. So we search and discuss it together which is interesting for the startup and for me as well.
Is there an area of preparation that the majority of accelerator teams could do better in?
Probably knowing their competitors and alternatives.
What are some projects you’ve been excited about recently? What are you working on?
Almost all the startups in the current batch are nice. We’re working on http://flowreader.com/ , http://testomato.com/ , http://kinohled.cz/ , some machine learning problems and one as yet unlaunched project.
How has the Tech Startup landscape changed in the past 10 years in the Czech Republic? What do you see coming in the future?
From the czech websites, only Seznam.cz is innovating. The other major players did nothing technologically worth mentioning for several years :(.  The global startups operated by czech people are more interesting and I think we’ll have more billion dollar companies (to accompany Seznam.cz, GoodData, AVG and Avast) in the next few years.
[ssba]

What does it take to Launch a Start-up: A Genius, or a Businessman?

Last week, our Director Cedric Maloux wrote about the trials and travails of hiring a CTO or a developer, for startups that have neither. Can a technology business thrive, even if it doesn’t revolve around a bona-fide technologist? Cedric is not exactly enthusiastic about the idea: “It depends,” he says.

Companies that started as purely technical projects, supported by little more than the geeky-obsessive interests of people who would only later be labeled “founders,” are legion of course.

Why Business Oriented Might be Bad

http://www.flickr.com/photos/thomashawk/7050489913/in/photostream/lightbox/

Extreme devotion to your ideals can produce amazing results in time.

Larry Page and Sergey Brin saw the genesis of Google as fodder for a research project, which they co-authored, replete with fascinatingly unprescient commentaries as “We have designed Google to be scalable in the near term to a goal of 100 million web pages,” and “we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.” and absurdly understated ones like “We are optimistic … that there is a bright future for search.”

Their technical-single mindedness, and academic backgrounds, even led them to conclude, as a result of their research into the structure of their proposed search engine (Google was then more a mathematical design concept than a business), would be best served by remaining an academic project- implying that it was too fraught with ethical complications to become a real business.

Ethical complications or no, however, Google did become a business. A rather enormous and complicated, and profit motivated business. But would it have risen above the fold of a market then defined by get-rich-quick, morally flexible business approaches that were driven almost exclusively by profit incentive, rather than product and concept integrity?

It’s not an unfamiliar trajectory for a number of highly successful web projects that grew up, and became truly influential, following the tech bubble of the late 1990s.

Mark Zuckerberg, had he been anything but what he was, which can safely be described as not business oriented, would have viewed the potential market for social networking in 2003 with a jaundiced eye. The market was dominated by Myspace, an offshoot of a firm with an ignominious reputation for developing malware and spam delivery mechanisms centered around internet pornography sites, and had been built in a matter of weeks to explicitly ape the most popular features of Friendster, a less cynical, but perhaps less aggressive competitor. Myspace was then by some measures the most popular website in the United States (this was at a time in which the US still dominated international web traffic volume). It’s hardly a surprise that few were even considering taking on Myspace in its own market. Of course, Zuckerburg’s potential business partners had their own questionable judgement to consider.

Just as Brin and Page viewed search as fundamentally broken by the prevailing advertiser model, Zuckerberg clearly saw the social networking world as fundamentally broken for most of the same reasons.  And yet his competitive nature drove him to enter the fray as an alternative that was based upon delivering desired results, with no eye to short-term profitability. This in a time in which an investor would have seen MySpace’s market dominance as a strong disincentive to enter the fray. Indeed, even when Peter Thiel did sign on as a Facebook investor, he had to wait 5 years for the company to reach profitability. It had been dominant in its market for several years before it ever became cash flow positive. That’s not a bet most investors would make, and with good reason.

Why Business Oriented Might be Good

Technical co-founders can get a little... carried away.

Technical co-founders can get a little… carried away.

The startup landscape of 2014 takes lessons from these successes, but should also pay equal attention to the drawbacks inherent in the approach that, while it worked in the long run for Brin, Page, and Zuckerberg, has sunk or failed to fuel the growth of thousands of would-be titans. For every Amazon, a company that manages, even if barely, to keep its revenues just above its costs as it continues to expand, there is a Pets.com, a company that loses more money, the faster it grows.

I caught up with Damian Brhel this week, from Brand Embassy, a startup that entered StartupYard in 2011. When asked if a business-oriented approach is an asset, even to a fledgling startup, his answer was an enthusiastic yes (lightly edited):

Can a non-technical background be an asset for a founder of a tech company?

Damien: Sure, absolutely! Most startups fail because of sales, because they simply does not understand it. Maybe you can have awesome product, but if you don’t know how to market it, you’re gonna fail. However both have to be balanced – I can not imagine a tech company without a co-founder who has tech insight. I’ve seen a few and they have no idea how to develop, or how much it will cost, and that cause huge inefficiency.  Plus, it’s not just product + sales, it’s also many other ordinary things such HR, formal bureaucracy, operations, finance management, intellectual property, law, etc – these are areas where many startups are not aware (and many tech guys don’t know them at all).

While it’s a popular trope to imagine that a technical wizard will accomplish the Zuckerbergian feat of creating a product that will somehow justify its own existence through sheer quality, mass appeal, or perfect timing, allowing the technical genius never to have to dirty his hands with business concerns, the realities of launching most products preclude that from ever being a reality.

In fact, the landscape of the technology business over the last 30 years suggests just the opposite: that the true titans had most of their greatest accomplishments thanks to brave and insightful business strategies, and thanks less to their technical accomplishments. It’s was the great triumph of Microsoft to license DOS to IBM, but they didn’t write DOS themselves. And it was the great accomplishment of Apple to push a consumer facing operating system in the 1980s, but the genesis of that idea came from the Xerox PARC, a technology that had been around for nearly a decade, with Xerox failing to grasp its potential.

A survey of the Forbes top 100, selecting only for purely tech companies, bears out the notion that the biggest tech companies are built on strong business credentials. Samsung, though it’s been around in some form since the 1930s, didn’t become a true technology company until it was consolidated by its Chairman Lee Kun-hee in 1987, an MBA from George Washington University. The story is much the same for Apple, Siemens, IBM, and others.

Exceptions in most areas are very common: Dell, Cisco Systems, Google, and Oracle were all started by prototypical computer geeks, but a common thread emerges: most of these tech-founder companies experienced their initial growth in specialized markets, and expanded into new verticals later on, while the largest companies entered the market in several verticles, more or less at once.

And the biggest growth occurs when a team is being led by someone who can make business decisions that reach beyond the level of product, and deal with the company’s place in the market- whether that CEO is a founder or not.

No One Road to Success

 

Also no explanation for this man's success.

Also no explanation for this man’s career.

Perhaps the singular key to success among these purely technical teams was excellence, combined with adaptability. As Jobs found out in the 1980s, and Zuckerberg may yet discover, the key to survival is to adapt, and making wise business decisions, inasmuch as they may compromise the vision that a singular founder has for his products and his company, is vital for continued growth.

[ssba]

How to Hire a Developer if You Know Nothing About Coding?

I was asked recently if we would accept a team without a technical co-founder in the accelerator program and as of now, I am still struggling in giving a straight razor-cut answer Yes or No.

– “It depends”

Personality Matters

There are multiple parameters coming into friction when starting and running a startup and the personality of the founders is one of the most important one. Before looking at the idea, we have to decide if the person in front of us is capable of turning this idea into a sustainable business. In the case of a technical co-founder we also want to be sure that they have the right skills to deliver a good product. Not all technical co-founders are born equal.

Pure business founders, as a team, can apply to StartupYard. If they are selected, one of their first goals will be to find how to deliver the goods. This is a disadvantage compared to a team with a technical co-founder but not an impossible one to overcome. There are two ways to approach the problem: go on a hunt to hire a technical co-founder or a full-time developer (for cash and/or equities) or sub-contract the development of your first version. There are a lot of talented coders out there; who under the right management and direction will deliver your first version for a fraction of the price it would cost you to hire a full-time developer. Also the time it will take you to find that full-time developer might be longer than initially expected so you might want to get something out first while looking for your technical white knight. The difficulty, in either case, is to select them, understand how they work and make sure they are able to build what you have in mind.

How do you Hire a Developer or Technical co-Founder?

The first thing you want to find out when interviewing a developer, whether it is for a full-time position or for sub-contracting is if they have already built the same kind of product. The main reason why you want to know that is for the time-to-market of your startup. A developer who has already been faced with a similar system will know the caveats of one approach compared to another. Ask them for a sample of their work or a link to the service and see how it relates to what you want to develop. An online service is usually based on data you enter, store, manipulate, modify, search, display and interact with so ask them how this other product relates to yours in term of data manipulation and storage. You will save a lot of time and a lot of money if your candidate has done it before.

Tell them you would like to speak to 5 of their clients (in the case of a sub-contractor). A developer who has a good track record will have no problem providing you with the information. When contacting the former clients, ask them if they would hire him again and if so what would they pay attention to this time. If not, try to understand if the reason can be addressed by you.

Depending on the type of product your startup is about, it’s possible the developer will have to deal with some technologies he is not that familiar with. Ask them how many programming languages they are going to use and for each of them, ask them if they like this language and if they would consider themselves an expert at it or not (you can ask them to rate from 1 to 5 their knowledge of each programming language they are going to use). Programmers usually have one or two favorite languages so be wary of developers telling you they are an expert at everything but they do exist so make sure they can prove it to you.

Communication Matters

Ask them how much they enjoy programming. Is it just a job or a passion? Do they participate in open-source projects? Like for any other position, the more passionate they are the better.

Developers are not worldwide renowned for being the most extroverted individuals (“Do I really have to talk to a person?” once joked a coder friend of mine) and you want to be sure that you will be able to get on well with them. Since you won’t understand what they are doing, you want to make sure you understand when they will explain to you what is happening in the software they write.

Ask them what makes a good developer and then ask them how this relates to them point by point. There are no good or bad answers; you just want to see how the person reacts when put in a defensive position and also how they consider themselves. Ask them how they handle delays in delivery of their software.

Show Them the Final Product

By now you need to have every single screen of your product designed in the form of wireframes or mockups and ideally documented. This is what they will use as a reference to build your product.

First, you want to see if they like your product or not and if they are going to be able to build it and in what amount of time. If they look at it and seem as excited as you are when you look at a pen on your desk, you might want to reconsider the person.

Looking at the screens, if you did your job well, your candidate should not have to invent what happens when a user clicks on a button or what happens when the user enters bad information of any sort. They concentrate on how to represent the data of your product and what needs to be done to them to go the next state. For each screen ask them if they see something that they do not understand and see how confident they feel about making it happen. Ask them once again if they have done it before. If you hear that a lot of screens ‘will not be that easy because…’ ask them how they would approach the problem. See how confident you feel about their confidence.

Be nice! You will meet some fantastic people with a passion for writing computer code. Try to understand them and create a connection because when you find the right person with the right skills and the right attitude, great things can happen.

[ssba]

What I learned in 18 Months as an Email Marketer (Part 2)

Advertising to Sell Vs. Advertising to Learn

It’s quite common for a small company in ramen profitability to start treating every conversation with customers as a sales opportunity. That’s the right attitude, anyway. As I’ve mentioned here in the past, every interaction is a a kind of sale. Either you are selling yourself to an investor, the media, or a partner, or you are selling your product to a customer. But just as you should always refine your pitch based on how well it has actually worked, you should do the same with your advertising. In fact, you can use advertising as a very cheap and effective form of research.

Shopping your Brand, and Testing Assumptions

Shopping your brand is sort of like focus-grouping, only you do it in the wild, and you use real customers as the focus group. The idea is to get a good sense of whether your target market is who you think it is, and if so, what that group responds to best.

You may run into some surprises doing this. The people you are most willing to buy your products may not always be the ones you’d expect. A classic industry example is women’s underwear: Victoria’s Secret has long known that male customers will spend more money, faster, than women buying the same products, and will be less interested in discounts and sales. And the reasoning is simple: men feel uncomfortable shopping for intimate items, and also don’t want to be caught looking cheap while doing it.

By the same token, though women may take longer than men to make purchases online, they also buy more electronics than men do, and consume more online media, and spend more time on social networks, taking the majority share in Facebook, Twitter, and Pinterest.

Catering to a demographic doesn’t have to mean talking down to that demographic either. You don’t see Apple making stereotypical “lady Iphones,” but their products are possibly more popular with women than with men. I can guarantee: Apple tests their taglines and slogans on women as much as they do on men. If you were selling a sleekly designed, high end electronics toy to the top of the market, would you care what women thought of your marketing?

Assumption testing can prove that your products are appealing to people you never considered potential customers, and for reasons you haven’t even thought of.

Case in Point:

Cedric Maloux, our CEO, created an app for IOS last year, and as soon as he had a working app, he “shopped” it using a number of targeted ads on Facebook. And since Facebook allows you to segment your market and target your ads to people based on sex, he decided that he wanted to know which taglines would work more with women, and which with men.

Because the nature of the app was targeted at a hobby that is overwhelmingly popular with men, he was surprised to find that women responded to the ads too. Not as much as men did, but there were key taglines that women responded to. There were ways of representing the app that appealed to women even more than they did to men. And by using the “advertise for research” approach, he was able to zero in on marketing that worked across these different segments.

And it isn’t just gender, either. There are loads of assumptions you probably make about your customers, and which you can test very effectively for little cost.

Agile Marketing: How it works

Agile methodologies don’t work every step of the way. There’s no set of iterations you can take in coming up with your products that will take you from no idea what to do, to a completed project. You need to be inspired first- and without that, there is no testing to do; no basis of comparison between a non-process and a good process. The same is true of agile marketing. It relies best on a seed of inspired thinking about your customers and your product, followed by rigorous and ego-free examination of what really works. I’ve done tons of marketing material that I loved, and only a small minority of it ever worked well. That makes marketing and coding not so different at all.

Start with Simple Questions

Like, what’s the first thing my customers see? Does it work? Does it work for the market I’m targeting? Could something else work better? Where am I losing customers?

 If your company already has a logo, then you probably have a slogan and associated taglines too.  There’s plenty of advice available about how to write them, and it isn’t actually that hard. I mentioned recently that a tagline or a slogan is more like a static element for a website. It’s fertile ground for A/B testing, and that’s what you should be doing- all the time, and not just for the homepage, but also landing pages for any campaign you run.

You can start with broad assumptions about your customers. Segment them into “likely,” and “unlikely” customer groups, and run a couple of different versions of ads and associated landing pages for each group. Create ads targeting both groups, and show them to both groups too. If your likely customers respond to the ads that target them, then you know you’re right about your market and your strategy. But if your unlikely customers respond more than you anticipated, you can continue to segment them, dialing in the specific messaging and the specific part of that market cohort that *is* responding to your marketing. You may find you’re sitting on a potential client base you never considered. And if your likely customers respond to your ads for your unlikely customers just as much as the ones targeted at them, then this may tell you that your messaging to these customers is not as effective as you thought it might be. You may have more opportunities to sell to this group with a different approach.

Set Clear Goals

Using a system like OKRs, set objectives that involve clear answers to your simple initial questions. For instance, a question like “Is my purchase page losing sales because of the design?” can be associated with an objective like: “improve purchase page performance by 25%,” with key results being simple items like: “define the most effective call to action,” and “reduce distracting elements that cause users to bounce or navigate away.” Now follow the formula: testing incoming hits on the purchase page while cycling through these changes. You may find that something as simple as a stronger call to action can raise your conversion rate from 0.20% to 0.25%, and in terms of an online store, that’s can be an enormous increase in revenue.

Shocking how often this works.

Shocking how often this works.

You may well be shocked to realize the difference a single element makes in your overall revenue when it iterates itself over thousands of hits on a landing page or a homepage every day. There, a tagline that works just 1 time in 1000 more than another can mean the difference between life and death.

But you want to be doing this testing now- not when thousands of page views are already in play. That’s why small ad-buys on facebook or other platforms can give you the intelligence you need to get ahead of these questions- before you realize you don’t have it right.

[ssba]

What I learned in 18 Months as an Email Marketer (Part 1)

When I took my first marketing job, I assumed that writing was writing, no matter what you were writing about. So I sat down, strapped in, and learned as much as I could about what it takes to write email copy, landing pages, and sales campaigns. Easy, I thought. Turns out, it’s nothing like normal writing at all. Here’s what I learned.

The Call To Action

“Don’t talk about what you want. Talk about what your customer is going to do.”

Last week we talked here about selling without selling. I’m going to reiterated the 7 principles that are mentioned in that post. These are the 7 elements that should be represented in some way, in every communication with a customer.

Trust: In you and the product. Let the customer do what they would normally do.
Understanding: Be as simple and clear as possible. The customer is not smarter than you.
Emotions: Use humor, use evocative words, show love and caring. Show passion.
What to do: Buy, sign up, click, share, read on
When to do it: Now? Later? Soon?
What I get out of it: Speak about effects of the product, not the features.
When it will happen: Examples, case studies, quotes, and testimonials

Last week we talked about the first 3. Now we’re going to talk about the middle two: what to do, and when to do it.

All of your marketing copy, the copy that is designed to lead a customer from contact with your website or marketing materials all the way to a purchase, has to inspire an action on the part of your customer- to draw the customer inexorably toward a purchase, a sign up, a click, a share, or any action that you want a customer to take.

What it Does

Your calls to action should always be just enough to inspire the viewer to take the next step he/she is comfortable with, and no more, engaging each user to the maximum of their comfort, without scaring them away by pushing them to a sale or action they aren’t ready for. This feeds back to trust: you should never be in the position of asking a customer to do anything he or she doesn’t understand.

If you think of your website or marketing approach as a pinball machine, your customer is the ball. Your marketing emails, landing pages, and homepage are all those little widgets and bumpers that keep the ball moving, and your copy is gravity. When the ball finally ends up falling into the hole, that’s a sale. Here’s the trick: pinball isn’t fun if the slope is too steep, and gravity is too strong. A satisfying sales experience, or any intended customer outcome, is one that feels natural, and rich for the customer. As the customer “bounces,” from one element of your online presence to the other, they should constantly be only in danger of heading towards a sale, never getting there unless they’re ready.

If you think of the customer as the ball, then you shouldn’t worry too much if they don’t go directly to a sale every time. The longer they bounce around and absorb what you do, and who you are, the more likely they are to eventually decide to buy. And the longer they take making the decision, the clearer it will be when they do buy that the product is right for them.

There’s rarely a reason to push: a customer will only ever buy when they are ready to buy, and not before. If you have a lot of hits on your purchase page, but few conversions, then you have probably already realized this: when confronted with a decision to buy, nobody is going to do it earlier than they expected, without a very good reason (that’s where discounts and promotions come in, but that’s another story).

What It Looks Like

Last month, I edited the StartupYard Homepage and a few other landing pages for copy. Our CEO Cedric Maloux had done the layout and the basic copy. Here’s what we started with:

What Are You Working On…?

Data

We’re looking for the craziest, most ambitious projects out there, having to do with the manipulation of large clusters of Data.

Search

Have you invented a new search algorithm to complete, or compete with, the big boys? We want to hear about it.

Analytics

Do you have a new, unique way of making sense out of Data? We have x TB of data for you to play with.

This copy is good, but ask yourself: what are you being asked to do? The first line has an open question. That’s a good challenge to go further. Now we get an explanation, under the Data header, of what we’re looking for: “the craziest, most ambitious projects.” Well that’s nice, but what does it mean? Why are you being told all this?

And in the next header under Search, we explain what we want. Nobody wants to know what we want, unless it involves giving them money. And since we actually do want to give you money, why not come out and make that part clear? The site visitor is not sure who this information is for: does it apply to them? Will they find something they are looking for? They are not being given a direction to go in, only information to absorb. That’s only half of what they need. Here’s the copy with calls to action.

Chpj_R8EYthBhbW4oMH6TYHhFpLnacNFCPV_x8c3somHt3mtmNmAJN3wPoj5zIqs21kZV6_Z0DOAaK2vrnvb742ih0ZDRHRopEdUFTPjTW9EHsh_1ukslEl-dg

Now we have some strong direction. The visitor is being shown what we want, and being pushed to engage to their level of comfort. We have a call to connect in the second column, a challenge in the first, and an invitation in the third column. That’s three ways we’re inviting the user to go further right now. We don’t care how they do that: they can email us, they can read the rest of the homepage, or they can go right to the application. We make it clear here that they’re free to choose. They can seek their own level of comfort with the process.

While it seems superficial, these calls to action have powerful subconscious effects. In Blink, bestselling economics and sociology writer Malcolm Gladwell discusses widely known psycho-social experiments in which individuals are “primed,” with certain words before being placed in a social situation. For example, a person who is told to read a list of words including “patience,” “serenity,” and “calm,” is overwhelmingly more likely to wait patiently for her turn than a person who is told to read words like “aggressive,” “confront” and “fight.”

The words that an individual hears or reads has an enormous impact on how that person behaves in the short period afterwards. Words that prime for activity and exhort action will engender better results virtually all of the time.

Open for Business

Have you ever stood outside a shop or a restaurant, and not been completely sure that they were open, so you didn’t go in? Ever passed a restaurant because it looked closed, only to see someone come out and only then realize it’s open? We all have.

Now, think about those lit-up signs on shop windows; “Come in, We’re Open.” You might think you’re not affected by this kind of thing, but you are. Your confidence, inspired by this tiny little call to action, will empower you to stride directly to the front door and open it. No questions asked.

A homepage is no different. These kinds of calls to action are just a little tiny nudge saying: “We are here for you. This is about you. It’s not a blog, it’s not PR, and it’s not about us, it’s about you.”

A Good Call to Action is not Manipulative.

Sometimes marketers go a little nuts when they figure out that a call to action can be such a powerful thing. What would you do, for example, if I could promise you that I could increase your open rate on emails by 100% in one day? Would you want that?*

*This is a classic call to action, by the way: of course you want to increase your open rate by 100%. You are supposed to say yes.

What they can forget is that no matter how powerful a call to action, you can never sell someone who doesn’t want to be sold. Calls to action are nudges, not shoves.**

**And this is a classic trust-building rejoinder: I am telling you I understand your problem and I understand that the solution is not as easy as some say. I’m showing you that what I’m offering is better than the competition, and that I care about what’s best for you.

If I nudge you, you’ll scoot forward in the direction you want to go. If I shove you, you’ll go in the direction I wanted for a moment, and then start resisting me.

This is no different from the store analogy from earlier.

What kind of salesman do you trust? The one who waits for you to look around, then approaches you quietly and asks if you want help? Or the one who approaches you as you walk through the door and asks you what you’re looking for? The first salesman wants to help you. The second salesman wants to control you.

The first one gives you a choice: tell him what you are looking for, or say no, you don’t need help. Either way, you do what you want. The second makes a demand: tell me what you’re looking for, or break all trust, and lie, saying you are “just looking.” Either way, you are forced to choose between two options you may not like.

You may buy from the second salesman if you know exactly what you want, but you wouldn’t buy from him if you weren’t sure. You haven’t been given the space to establish comfort with his offerings, before he is pushing on you to tell him what you want to buy. And if he is very good, and manages to get you to make a purchase before you’re ready, you would never, ever return to that store. Your embarrassment would be too great. You’d associate him, and that product you bought, with an experience you’d rather forget. This is not a win for anyone.

Online marketing is no different. People know when they are being invited, and when they are being pursued. And they don’t like being pursued in that context.

Do you get those emails from click-bait websites, with some video and a headline that says: “Drop Everything Right Now and Watch This or You Will Be Sorry,” or something to that effect? We’ve all been there. Or maybe you see those chain letters on Facebook, that manipulate you into sharing some overly sweet piece of human misery, that turns out wasn’t true in the first place?

Yes, they work. Once. And while you can get a lot of traffic from a sensational headline, and a big email open rate from a grabby subject line, that’s all you’ll get: traffic. Because people know when they are being manipulated, and they don’t like it.

This kind of marketing is for ad-driven spam sites, not for an honest product site that is actually offering people something they need. So don’t be a manipulative bastard. It doesn’t work anyway.

 [ssba]

Calling all European Coders: What Could you Build with this Web Crawler Hadoop Database?

Last week we announced that Seznam.cz was opening part of its search technology by providing a cluster of data. Today, we are happy to give you more details.

Seznam.cz full text search technology is based on Hadoop and Hbase. The teams will have access to a test cluster of up to 100 million documents from the Internet. All of them pre-crawled and sorted into entities such as domains, webservers and URLs. Each of these entities contains its own attributes for fast analysis and sorting of each web page in the cluster.

More specifically, the 3 entities are :

  • Domains – these are equivalent to DNS name structure, domains are organized as a tree. Root entity is special domain “.”,
  • Webservers – a “webserver” is the specialization of a “domain” (webserver = domain + port). They gather URL statistics and other attributes related to a webserver as a whole (for example content of robots.txt is Webserver relevant).
  • URLs – a URL represents a document on a webserver. “URL” is always related to some “webserver”. It contains all attributes relevant to a single web page.

Each entity has a key. The key looks like a modified URL – the hostname parts are in reverse order, the rest of the url is lowercased and cleaned up. It is possible to recognize an entity type from its key value. For example:

  • URL: http://www.montkovo.cz/Cenik/?utm_source=azet.sk&utm_medium=kampan11
  • URL-key: cz.montkovo.!80/cenik
  • webserver-key: cz.montkovo.!80
  • domain-key: cz.montkovo.

The whole database is sorted via the key (ascending), so that all URLs on the same webserver are co-located and could be processed one after another.

Here is a list of common attributes for each entity:

Domain entity

  • Key
  • IP address of the domain (if exists)
  • Number of direct sub-domains
  • Number of all sub-domains
  • Number of all webservers in all sub-domains
  • Number of all known URLs (URLS related to all sub-domains). We call this state of URL as “key-only”.
  • Number of all downloaded URLs. State “content”.
  • Number of all processed URLs (i.e. parsed and extracted basic features). State “derivative”.
  • Number of redirects
  • Number of errors (i.e. URLs with downloading or processing error)
  • Average document download latency

Webserver entity

  • Key
  • Webserver homepage (key to that URL)
  • Content of Robots.txt (robot exclusion protocol) relevant to our crawler
  • Number of all known URLs (state key-only) related to this webserver.
  • Number of all downloaded URLS (state content) related to this webserver.
  • Number of all processed URLs (state derivative) related to this webserver.
  • Number of redirects
  • Number of errors
  • Average document download latency

URL entity

  • Key
  • URL as seen on the web
  • Last download date
  • Last HTTP status
  • Type of the URL – could be few (not downloaded, web page, redirect, error, …). Mind: type of the URL is not the same as HTTP status. For example: HTTP status is 200 OK, but URL type is redirect, because we have detected software redirect within the page content.
  • Attributes specific for different URL types:
    • Not downloaded page
      • We have no explicit information about this page. Only factors that could be predicted (for example document language) and off-page signals (like pagerank) are available.
      • Prediction of document language
      • Prediction of explicit content (porn)
      • Pagerank – classic PR value calculated from link graph
      • Link distance from webserver homepage
      • List of backward links, each contain:
        • Key of the source page
        • Anchor texts relevant to this link
        • HTML title of the source page
        • Pagerank of the source page
    • Web page (i.e. downloaded page with regular content)
      • Alternative URLs for the page – each page could be presented under multiple different URLs. This is scored list of those possibilities.
      • Detected document’s Content-Type
      • Downloaded content
      • Content version – date/time of content download. Could be different from last download date (note: 304 Not modified)
      • Major language – language identified as “most relevant” for this page – could be different from most frequent language on page (different lang for body text vs. menus)
      • Homepage – flag if this page is webserver’s homepage
      • Pagerank – classic pagerank value
      • Link distance of this page from webserver’s homepage
      • Derivative (attributes obtained by further processing):
        • Document charset
        • Detected languages on page with their frequencies
        • Explicit content flag – detected porn
        • Document title
        • Document <meta description …>
        • Document content parsed down to a DOM tree
        • Forward links found on the page
      • List of backward links. Each one have:
        • Key of the source document
        • Anchor texts (extracted from source document) relevant to this link
        • HTML title of the source page
        • Pagerank of the source page
    • Redirect
      • Target URL key
      • Homepage – flag that this redirect is part of redirect chain to a webserver’s homepage
    • Error
      • The same info as for “not downloaded page”
      • We could provide some more, for example date of last download when the page was OK, if it would be necessary for something.

With all this data at your disposal, what could you build? The cluster will be updated and new entries can be added as per team requests. We are looking for the best ideas in the area of Data, Search and Analytics.

Wherever you are in Europe, we will pay for your flight ticket and your accommodations for 3 months in Prague so that you can participate in our accelerator program. Why don’t you start your application now?

[ssba]

If you have any questions about the database, enter it as a comment below