4 advises on becoming more semantic

Posted May 26, 2009 by norheimd
Categories: SemanticWeb

Tags:

Over the years I’ve worked on quite a few semantic projects, and I’ve listened to hundreds of presentations, and helt over 50 myself. A few things has become very clear to me in this time

1. The need to know what is important to You! What is the ultimate solution for you! If you collaborate with just one other party – XML is fine! If you have established standards with everyone you work with, use them! If you are in an “accidental collaboration” environment you will need clearer semantics. And if you own public data, release them on the semantic web! YOU have to look at your vision, business model and collaboration patterns!

2. Many, if not most, people talking about semantics have no hands on experience. It never stop being amazed to see and hear people talking about this without understanding where the state-of-the-art is. Vision of the technology’s possibilities is not enough. You also need to know the limitations regarding standards, toolkits maturity, theoretical limitations and applied use cases – and even more important experience for more than toy-examples so you can compare to as-is technology.

3. Know the important differences between approaches. Everyone are now calling what they solution semantic. Chosing wrong may lead you on a side track, it may not fit you vision. Know what the world is doing or you may end up with Minitel while the rest of the world uses the Web.

4. Know the steps to your goal. What steps can be executed now, and what needs more maturity from you, the standards, tools and developers. Know that executing early gives wounds, executing late you may lose an opportunity. Know that you can engineer around some problems, others may be harder.

Short and simple, get informe and know what is right for you!

Sesam4 and Tourism

Posted April 24, 2009 by norheimd
Categories: SemanticWeb

Tags:

Sesam4, not to be confused with sesame from Aduna, is a Norwegian research project with participation from Computas, Ovitas, Cyberwatcher, Vestlandsforsking and others.

One of the objectives of the project is to address tourism sector with semantic web technology. The sector is chosen for a number of reasons:
• Large number of stakeholders
• Most of the service providers have small budgets for technology and integration
• Large amount of potentially interesting information, with a substantial precentage public
• Geolocated information
• High visibility

Mostly this means that it is a challenge to create good solutions with traditional technologies, and a great potential for semantic web technologies.

Through the first phases of the project we have defined some use cases, scenarios and identified some key challenges. My company, Computas, is one of the participants in the project. In the next months we will focus on how to apply semantic web technologies in the scenarios. We’re starting this week with an off-site workshop with 30 people. More on the results will be posted later.

Anyone else applying semantic web technologies for tourism?

Semantics in the Norwegian government

Posted March 22, 2009 by norheimd
Categories: SemanticWeb

Tags:

The Norwegian Registry Centre, Brønnøysundregistrene, is implementing a metadata repository for configuration of concept, structures and messages exchanged between the private sector and the government.

The project, known as SERES2 has gpne through several phases. It has clearly defined benefits; simplifying and reducing the reporting burden of enterprises, and harmonizing models for reuse. The use cases are also quite clear. Merging existing message exchange models (e.g XSDs) into common models and thereafter producing new message exchange models top-down with clear semantics, structure and under a well defined workflow and configuration management.

The repository, which is a central part of the final system is based on a product from Adaptive Inc. which allows a custom metamodel to be defined, and instances of these to be imported, and versioned in the repository.

First glance at a populated system will be at the Semantic Days conference in Stavanger May 18-20.

The Media Zone

Posted March 22, 2009 by norheimd
Categories: SemanticWeb

Tags:

This weekend eight of the semantic web Media Zone project team are gathered in a cabin in Hemsedal, Norway. There is no running water here, which means that we have to heat water on the woodstove and can take no showers. No problem. We do have electricity though, and Television.

I often read trends from the behavior of my friends in situations over time. Of course I’m not assuming we are representative of the general population.

Interrestingly the TV has not been turned on the whole weekend. But we’ve seen a lot of media. On our iPhones and laptops. Our music has been streamed over the 3G network with Spotify. At least eight YouTube videos have been shared.

While I’m blogging about this on my iPhone, Magnus is playing the Norwegian anthem on his iPhone. Daniel is playing Ski Jump on the iPhone instead of trying the one outside the cabin. Frode is creating maps while skiing. Everyone are twittering about what they are doing. And did I mention that Magnus, Daniel and me played poker agains eachother and other facebook people in the car getting here.

What did people do five years ago? Messages to the world would have been sent by SMS, we may have had a guitar and for sure this blog would have been written in the cabin’s guest book. And TV would have been the preferred medium. I actually checked the guest book for March 21, 2004. They were only skiing…

I think I can safely say that in another five years we can still be without running water, but not without the Internet, and we will not notice that the TV is not there. The world is moving so fast that the rest will be guesswork.

Thanks for a great weekend Magnus, Pia, Frode, Daniel, Odd-Wiking, Christian and Robert!

Speak dialect, write english

Posted February 26, 2009 by norheimd
Categories: SemanticWeb

Tags:

I just had a discussion with a colleague  on the state of languages in Norway. Norway has two official languages bokmål and nynorsk, the first based on danish the second artificially created from dialects in the 19th century. Most people write bokmål whereas some 20% (my guesstimate) has nynorsk as their primary language in school. This said, the languages are quite close, and everyone can understand the other language.

So why not one written language? In the 60s it was tried to merge these into one written language called samnorsk. At the time it was not successful, and the merger failed. Since then however the languages have grown closer, and at the same time the use of English as a second language has become much more apparent. So why not try again?   

I’m a strong supporter of spoken dialects – and there are many of them in Norway (every valley has their own dialect). Because of this people are used to hear dialects constantly as well as swedish and other european languages. Dialects tells you where you’re from, and provide you with roots and belonging. With the globalization we are faced with the challenge that everything from children television to Internet are written in the most (read: affordable) language . In practice it means that english children TV series and movies are dubbed with Norwegian bokmål. The result is that children starts to speak a normalized bokmål that is quite far from their dialects. The result in a generation will be that we will loose the dialects and have one standardized Norwegian spoken language close to bokmål. At the same time everyone also gets to read and write english.

Globalization is inevitable and allows us all to speak together, however why should we loose our spoken dialects in the process. Wouldn’t it be  better if we all spoke our dialects (in addition to english as a second language) keeping our roots, and write english. We might lose our written languages (bokmål and nynorsk), but a written language is needed only to provide a common understanding – and english would be better in a globalized world anyway.

Social networks in the enterprise

Posted February 10, 2009 by norheimd
Categories: knowledge, mac

Tags:

I have for a long time used various kinds of social networks and tools. IRC, Yahoo! messenger, Skype, Myspace, Facebook and Twitter to name a few. Common to them all is the ability to set a status on what I am doing.

In the enterprise, especially in a knowledge intensive one, it is this ability that often lacks. Often you find youself in need for knowledge but do not know who to ask.

The latest tool that is currently swiping across my company is Yammer. While very similar to Twitter it is focused on the enterprise marked allowing people in the same domain to listen to oneanother. Also it has Facebook’s ability to add groups.

In one week more than 100 of some 175 has added themselves, and I’m looking forward to see how it will turn out. And it is not a disadvantage that it has a nice iPhone client. :-)

My extended memory

Posted January 24, 2009 by norheimd
Categories: knowledge, mac

Tags:

I have one major challenge in my everyday life: My short term memory.

Beyond the “normal” apps on my iphone (email, webbrowsing, contacts, calendar, ipod and phone), i have grown to enjoy a handful of “unconventional” apps to help me in this challenge.

Shazam is an incredible simple app i use for looking up music i hear around me which i then later can use to build up playlists.

Snaptell is a very similar app that allows me to find metadata about books, videos etc as you see them by taking a picture. I use the list regularly buying them on apple store or amazon.

With the wordpress iphone client i can sit anywhere and post my thoughts at any time, often on the train or right now in a workshop in Tallin.

And, i can follow other peoples thoughts and news with the NetNewsWire app i can when i want or are able to.

Common to all this is the fact that i don’t need to execute there and then and i create metadata and context. Any more of these apps?

Sublima @ ESTC 2008

Posted October 2, 2008 by norheimd
Categories: SemanticWeb

Tags:

For you who are interested, here is my presentation of Sublima at the European Semantic Technology Conference in Vienna last week: Estc-2008-norheim

Live from ESTC2008 – Semantic Web business idea contest

Posted September 26, 2008 by norheimd
Categories: SemanticWeb

Tags:

The Business idea contest is a contest where entrepreneurs gets 5 minutes to explain their ideas to investors with a €5000 prize. The contest shows some good ideas that 

  • Stephan Decker of DERI presents his idea around Sindice, a semantic web index. A crawled index of RDF you can use in your application. Business model as paid API or micro ads. Sindice already up and running, and I was not surprised that Stephan and his team won the prize. Congratulations!
  • Discoteka is a centeralized store for metadata and ontolgies for media.
  • D.o.o.m. SRL is a semantic based ranking system. The idea of a semantic ranking system of D.o.o.m. is good, but the pitch is not convincing me. 
  • Know who knows. Based on the mega-trends globalization, specialization and decomposition. Combining social software, semantic technology, information retrieval techniques, and data mining into Social interprice search. These guys got the third place. The Know who knows was a well polished presentation, but unclear what was new compared to current knowledge management systems.  
  • Webmark, equivalent to the branding trademark. Unfortunately the presenter did not manage to get his presentation running.
  • Emanuelle della Valle presented Squiggle a semantic search engine without convincing me.
It is nice to see some ideas being pitched to investors, unfortunately varying quality and mostly they did not convince me on how their solutions where better than existing solutions. Also the business model was not very clear in most cases. 

Live from ESTC 2008 – Semantic Search

Posted September 26, 2008 by norheimd
Categories: SemanticWeb

Tags:

Semantic search is one of the hot topics at the conference. What the players put in this though sometimes varies. The Friday morning keynote is from Hugo Zaragosa from Yahoo! Research.

Currently Yahoo! has a platform for getting better and nicer results from your search engine. Search monkey is an open platform for using structured data to build more useful and relevant search results. It is just change the snippets (result-view) adding deeplinks, images, name value pairs or abstract accessing the providers data sources.

Looking forward at challenges, Hugo shows a Yahoo! search where all search is annotated with what it is (restaurant etc.) in this you see location oriented in a map, and with faceted filtering on extracted metadata. Some of his key quotes

We move from a web of pages to a web of objects

Search is no longer about finding documents, but an interface for web mediated goals

Precision of navigational queries is solved

Document crawling and spam, indexing and retrieval, result relevance are not solved

How do you model intent?

- what is the right abstraction?

- what is the right granularity? / what ontology should we be using?

- what are the top intents?

And how do you measure relevance in the web of objects? What is the automated framework for relevance and What are the ranking models that can attain it?

 All this aspects are very relevant to what the semantic web tries to address.

Current research directions in Yahoo! Research in Barcelona covers MicroSearch, learning tags and searching objects. Microsearch (from peter mika a Yahoo! semantic web guru). This approach simply goes automatically to the pages of a resultset and gets RDF (RDFa, GRDDL) or Microformats and presents it, e.g. searching for Ivan Herman the result is shown with events in a timeline, addresses in a map. An example of trouble with intent (funny enough an example that I often use myself) is a search for Paris Hilton. Should you model the hotel or the person?

A second research direction is “learning to tag, tagging to learn”. His example is how wikipedia is moving from free text (on the left side of a page), and as metadata (the info box on the right side). The idea here is to combine the NLP on the left and the RDF on the right to create more information. NLP often loses relations but are good on types. RDF is often good on relations however weak on types.

A last topic of research is ranking objects, another topic I’ve been struggling with myself. The simple background is that you can rank results in web searches, wheras in the database world there is no ranking.  Why is this hard? Again some quotes from Hugo:

Search (Information Retrieval) technology greatly surpassed Boolean queries in the 80s 

Attempts to improve search technology with semantic knowledge have repeatedly failed. (except in a very narrow domain)Effective wquery expansion is very difficult

“Entity ranking” relates to the sorting of entities by relevance to a query. Hugo is explaining what he calls colored (typed) indexes and entity containment graphs. My oversimplified explanation is that you in the search are looking for simple triples that are extracted from the text and ranking them. 

A question from the audience relates to context, and whether Yahoo! Research is looking into it. The short answer is not in Barcelona, he however makes one nice point on this: “How much can we use of context before the user go from being happy to being spooked?”.