Speak dialect, write english

I just had a discussion with a colleague  on the state of languages in Norway. Norway has two official languages bokmål and nynorsk, the first based on danish the second artificially created from dialects in the 19th century. Most people write bokmål whereas some 20% (my guesstimate) has nynorsk as their primary language in school. This said, the languages are quite close, and everyone can understand the other language.

So why not one written language? In the 60s it was tried to merge these into one written language called samnorsk. At the time it was not successful, and the merger failed. Since then however the languages have grown closer, and at the same time the use of English as a second language has become much more apparent. So why not try again?   

I’m a strong supporter of spoken dialects – and there are many of them in Norway (every valley has their own dialect). Because of this people are used to hear dialects constantly as well as swedish and other european languages. Dialects tells you where you’re from, and provide you with roots and belonging. With the globalization we are faced with the challenge that everything from children television to Internet are written in the most (read: affordable) language . In practice it means that english children TV series and movies are dubbed with Norwegian bokmål. The result is that children starts to speak a normalized bokmål that is quite far from their dialects. The result in a generation will be that we will loose the dialects and have one standardized Norwegian spoken language close to bokmål. At the same time everyone also gets to read and write english.

Globalization is inevitable and allows us all to speak together, however why should we loose our spoken dialects in the process. Wouldn’t it be  better if we all spoke our dialects (in addition to english as a second language) keeping our roots, and write english. We might lose our written languages (bokmål and nynorsk), but a written language is needed only to provide a common understanding – and english would be better in a globalized world anyway.


Social networks in the enterprise

I have for a long time used various kinds of social networks and tools. IRC, Yahoo! messenger, Skype, Myspace, Facebook and Twitter to name a few. Common to them all is the ability to set a status on what I am doing.

In the enterprise, especially in a knowledge intensive one, it is this ability that often lacks. Often you find youself in need for knowledge but do not know who to ask.

The latest tool that is currently swiping across my company is Yammer. While very similar to Twitter it is focused on the enterprise marked allowing people in the same domain to listen to oneanother. Also it has Facebook’s ability to add groups.

In one week more than 100 of some 175 has added themselves, and I’m looking forward to see how it will turn out. And it is not a disadvantage that it has a nice iPhone client. 🙂

My extended memory

I have one major challenge in my everyday life: My short term memory.

Beyond the “normal” apps on my iphone (email, webbrowsing, contacts, calendar, ipod and phone), i have grown to enjoy a handful of “unconventional” apps to help me in this challenge.

Shazam is an incredible simple app i use for looking up music i hear around me which i then later can use to build up playlists.

Snaptell is a very similar app that allows me to find metadata about books, videos etc as you see them by taking a picture. I use the list regularly buying them on apple store or amazon.

With the wordpress iphone client i can sit anywhere and post my thoughts at any time, often on the train or right now in a workshop in Tallin.

And, i can follow other peoples thoughts and news with the NetNewsWire app i can when i want or are able to.

Common to all this is the fact that i don’t need to execute there and then and i create metadata and context. Any more of these apps?

Live from ESTC2008 – Semantic Web business idea contest

The Business idea contest is a contest where entrepreneurs gets 5 minutes to explain their ideas to investors with a €5000 prize. The contest shows some good ideas that 

  • Stephan Decker of DERI presents his idea around Sindice, a semantic web index. A crawled index of RDF you can use in your application. Business model as paid API or micro ads. Sindice already up and running, and I was not surprised that Stephan and his team won the prize. Congratulations!
  • Discoteka is a centeralized store for metadata and ontolgies for media.
  • D.o.o.m. SRL is a semantic based ranking system. The idea of a semantic ranking system of D.o.o.m. is good, but the pitch is not convincing me. 
  • Know who knows. Based on the mega-trends globalization, specialization and decomposition. Combining social software, semantic technology, information retrieval techniques, and data mining into Social interprice search. These guys got the third place. The Know who knows was a well polished presentation, but unclear what was new compared to current knowledge management systems.  
  • Webmark, equivalent to the branding trademark. Unfortunately the presenter did not manage to get his presentation running.
  • Emanuelle della Valle presented Squiggle a semantic search engine without convincing me.
It is nice to see some ideas being pitched to investors, unfortunately varying quality and mostly they did not convince me on how their solutions where better than existing solutions. Also the business model was not very clear in most cases. 

Live from ESTC 2008 – Semantic Search

Semantic search is one of the hot topics at the conference. What the players put in this though sometimes varies. The Friday morning keynote is from Hugo Zaragosa from Yahoo! Research.

Currently Yahoo! has a platform for getting better and nicer results from your search engine. Search monkey is an open platform for using structured data to build more useful and relevant search results. It is just change the snippets (result-view) adding deeplinks, images, name value pairs or abstract accessing the providers data sources.

Looking forward at challenges, Hugo shows a Yahoo! search where all search is annotated with what it is (restaurant etc.) in this you see location oriented in a map, and with faceted filtering on extracted metadata. Some of his key quotes

We move from a web of pages to a web of objects

Search is no longer about finding documents, but an interface for web mediated goals

Precision of navigational queries is solved

Document crawling and spam, indexing and retrieval, result relevance are not solved

How do you model intent?

– what is the right abstraction?

– what is the right granularity? / what ontology should we be using?

– what are the top intents?

And how do you measure relevance in the web of objects? What is the automated framework for relevance and What are the ranking models that can attain it?

 All this aspects are very relevant to what the semantic web tries to address.

Current research directions in Yahoo! Research in Barcelona covers MicroSearch, learning tags and searching objects. Microsearch (from peter mika a Yahoo! semantic web guru). This approach simply goes automatically to the pages of a resultset and gets RDF (RDFa, GRDDL) or Microformats and presents it, e.g. searching for Ivan Herman the result is shown with events in a timeline, addresses in a map. An example of trouble with intent (funny enough an example that I often use myself) is a search for Paris Hilton. Should you model the hotel or the person?

A second research direction is “learning to tag, tagging to learn”. His example is how wikipedia is moving from free text (on the left side of a page), and as metadata (the info box on the right side). The idea here is to combine the NLP on the left and the RDF on the right to create more information. NLP often loses relations but are good on types. RDF is often good on relations however weak on types.

A last topic of research is ranking objects, another topic I’ve been struggling with myself. The simple background is that you can rank results in web searches, wheras in the database world there is no ranking.  Why is this hard? Again some quotes from Hugo:

Search (Information Retrieval) technology greatly surpassed Boolean queries in the 80s 

Attempts to improve search technology with semantic knowledge have repeatedly failed. (except in a very narrow domain)Effective wquery expansion is very difficult

“Entity ranking” relates to the sorting of entities by relevance to a query. Hugo is explaining what he calls colored (typed) indexes and entity containment graphs. My oversimplified explanation is that you in the search are looking for simple triples that are extracted from the text and ranking them. 

A question from the audience relates to context, and whether Yahoo! Research is looking into it. The short answer is not in Barcelona, he however makes one nice point on this: “How much can we use of context before the user go from being happy to being spooked?”.

Live from ESTC 2008 – Ontology engineering

The 2nd European semantic technology conference in Vienna is a industry outreach conference, and the most business-near conference in this field in Europe. On the first day I entered the ontology engineering tutorial.

The first part tried to capture the business case for ontology engineering in enterprises through Enterprise Information Management. Unfortunately I was left with a Catch-22 feeling that we need large projects to get the real benefit from ontologies, and that no-one will take the risk of using fairly unproven technologies for larger projects. It is hard to see this talking off in the short term in the enterprise. One should IMO focus more on what you gain from adding some semantic web stuff into the enterprise fabrics, and where it is complementary.

Second part was a good walk through the various methodologies of ontology engineering also touching on ontologies build from wikis, games and tagging-approaches. Very interesting was also a framework for estimating the effort related to the building of an ontology. Basically it is a formula to evaluate the development cost given the size of the ontology, the domain complexity, development complexity, quality and personnel competencies.  The Ontocom framework is proven to be within 30% accuracy in 80% of the cases. As an example ontology of 1000 concepts and properties it will take between 5 and 12 months depending on the other factors. The framework is based on lessons learned from 40 ontology projects and ongoing.

A fairly The opening keynote by DFKI however was all too scientific for this conference… really not what is needed at this kind of conference.