Monthly Archives: September 2008

Live from ESTC2008 – Semantic Web business idea contest

The Business idea contest is a contest where entrepreneurs gets 5 minutes to explain their ideas to investors with a €5000 prize. The contest shows some good ideas that 

  • Stephan Decker of DERI presents his idea around Sindice, a semantic web index. A crawled index of RDF you can use in your application. Business model as paid API or micro ads. Sindice already up and running, and I was not surprised that Stephan and his team won the prize. Congratulations!
  • Discoteka is a centeralized store for metadata and ontolgies for media.
  • D.o.o.m. SRL is a semantic based ranking system. The idea of a semantic ranking system of D.o.o.m. is good, but the pitch is not convincing me. 
  • Know who knows. Based on the mega-trends globalization, specialization and decomposition. Combining social software, semantic technology, information retrieval techniques, and data mining into Social interprice search. These guys got the third place. The Know who knows was a well polished presentation, but unclear what was new compared to current knowledge management systems.  
  • Webmark, equivalent to the branding trademark. Unfortunately the presenter did not manage to get his presentation running.
  • Emanuelle della Valle presented Squiggle a semantic search engine without convincing me.
It is nice to see some ideas being pitched to investors, unfortunately varying quality and mostly they did not convince me on how their solutions where better than existing solutions. Also the business model was not very clear in most cases. 

Live from ESTC 2008 – Semantic Search

Semantic search is one of the hot topics at the conference. What the players put in this though sometimes varies. The Friday morning keynote is from Hugo Zaragosa from Yahoo! Research.

Currently Yahoo! has a platform for getting better and nicer results from your search engine. Search monkey is an open platform for using structured data to build more useful and relevant search results. It is just change the snippets (result-view) adding deeplinks, images, name value pairs or abstract accessing the providers data sources.

Looking forward at challenges, Hugo shows a Yahoo! search where all search is annotated with what it is (restaurant etc.) in this you see location oriented in a map, and with faceted filtering on extracted metadata. Some of his key quotes

We move from a web of pages to a web of objects

Search is no longer about finding documents, but an interface for web mediated goals

Precision of navigational queries is solved

Document crawling and spam, indexing and retrieval, result relevance are not solved

How do you model intent?

– what is the right abstraction?

– what is the right granularity? / what ontology should we be using?

– what are the top intents?

And how do you measure relevance in the web of objects? What is the automated framework for relevance and What are the ranking models that can attain it?

 All this aspects are very relevant to what the semantic web tries to address.

Current research directions in Yahoo! Research in Barcelona covers MicroSearch, learning tags and searching objects. Microsearch (from peter mika a Yahoo! semantic web guru). This approach simply goes automatically to the pages of a resultset and gets RDF (RDFa, GRDDL) or Microformats and presents it, e.g. searching for Ivan Herman the result is shown with events in a timeline, addresses in a map. An example of trouble with intent (funny enough an example that I often use myself) is a search for Paris Hilton. Should you model the hotel or the person?

A second research direction is “learning to tag, tagging to learn”. His example is how wikipedia is moving from free text (on the left side of a page), and as metadata (the info box on the right side). The idea here is to combine the NLP on the left and the RDF on the right to create more information. NLP often loses relations but are good on types. RDF is often good on relations however weak on types.

A last topic of research is ranking objects, another topic I’ve been struggling with myself. The simple background is that you can rank results in web searches, wheras in the database world there is no ranking.  Why is this hard? Again some quotes from Hugo:

Search (Information Retrieval) technology greatly surpassed Boolean queries in the 80s 

Attempts to improve search technology with semantic knowledge have repeatedly failed. (except in a very narrow domain)Effective wquery expansion is very difficult

“Entity ranking” relates to the sorting of entities by relevance to a query. Hugo is explaining what he calls colored (typed) indexes and entity containment graphs. My oversimplified explanation is that you in the search are looking for simple triples that are extracted from the text and ranking them. 

A question from the audience relates to context, and whether Yahoo! Research is looking into it. The short answer is not in Barcelona, he however makes one nice point on this: “How much can we use of context before the user go from being happy to being spooked?”.

Live from ESTC 2008 – Ontology engineering

The 2nd European semantic technology conference in Vienna is a industry outreach conference, and the most business-near conference in this field in Europe. On the first day I entered the ontology engineering tutorial.

The first part tried to capture the business case for ontology engineering in enterprises through Enterprise Information Management. Unfortunately I was left with a Catch-22 feeling that we need large projects to get the real benefit from ontologies, and that no-one will take the risk of using fairly unproven technologies for larger projects. It is hard to see this talking off in the short term in the enterprise. One should IMO focus more on what you gain from adding some semantic web stuff into the enterprise fabrics, and where it is complementary.

Second part was a good walk through the various methodologies of ontology engineering also touching on ontologies build from wikis, games and tagging-approaches. Very interesting was also a framework for estimating the effort related to the building of an ontology. Basically it is a formula to evaluate the development cost given the size of the ontology, the domain complexity, development complexity, quality and personnel competencies.  The Ontocom framework is proven to be within 30% accuracy in 80% of the cases. As an example ontology of 1000 concepts and properties it will take between 5 and 12 months depending on the other factors. The framework is based on lessons learned from 40 ontology projects and ongoing.

A fairly The opening keynote by DFKI however was all too scientific for this conference… really not what is needed at this kind of conference.

ISO 15926 and the Semantic Web

In beautiful Sogndal in Norway, a group of 30 knowledgeable people is gathered today for seminar on the way forward with ISO 15926, and the use of OWL in this regards.

Matthew West, one of the key people behind ISO 15926 (why not give it a name?) gave some background and motivation behind ISO 15926, and how it is trying to model 4D (that objects exists in 3D and time) rather than pure 3D. He also addresses advantages and disadvantages of modeling ISO 15926 in entity-relational languages (e.g. EXPRESS and UML) versus description logic (e.g. OWL). The key take-away here is that OWL has a superior tool support and potentially can represent the complexity of ISO 15926.

The effort of trying to represent ISO 15926 in OWL is presented by Martin George Skjæveland from DNV. ISO 15926 is represented in EXPRESS with two main constructs, Entities and Attributes. They have made a simple translations between express and OWL

Entity owl:class
Subtype rdfs:subclassOf
Disjoint (one of) owl:disjointWith
Abstract owl:equivalentClass, owl:unionOf,
attributes with enity value owl:objectProperty, rdfs:domain, owl:cardianlty 
attributes with datatype value owl:datatypeProperty, rdfs:domain, owl:cardinality
attribute values owl:allVlauesFrom
EXPRESS datatype xsd:datatypes
List linked list in OWL (?) – drummond et. al…
Unique not translated – exeedes functional datatype properties – exeeds OWL DL

Some issues still remains, however it seems like there is almost a 1-to-1 mapping between the ISO 15926-2 EXPRESS and an OWL DL representation allowing the use of Semantic Web languages and tools. A SPARQL endpoint has also been created over the Part-2 of ISO 15926.

The next presentation “Building rich ontologies on OWL version of ISO 15926” by Johan Klüver from DNV starts from the realization is that domain experts use tools like Excel rather than knowledge modeling tools like Protégé. His position is to create “expert friendly interfaces” for domain ontology building where users are giving statements about his domain. The idea is to use Templates that compile statements down to ISO 15926 data structures.

In conclusion, it seems like some right steps has been taken in the direction of OWL. But what about the next steps in this bridge between ISO 15926 and OWL? There are still some issues that are not fully covered among others namespaces? provenance? representing part 4 (the reference data libraries, or domain ontologies) in OWL. And last but not least some more use cases for the ontology would be helpful.

Google using synonyms?

There are some talks these days in the blogsphere about Google adding synonyms in their search. Stemming – reducing the words you use to their base form or stem – they have had for a long time (e.g. run, running, runner resulting in the same result-set).

Using synonyms however is a much more complex task and relates to understanding the user’s intentions – including understanding more about the context the user is in. E.g. Port may be substituted by Gate, but also for Wine, and even more complex as we talk various languages into account – Gate in Norwegian also means Street. 

So are Google using synonyms as indicated in a few articles referencing an official Google Blog article? Not today from my understanding, but that they are looking into it as a central part of query understanding – for sure. And another Google Blog article explains this much clearer.

Personally, I also believe that their move into the browser market with Chrome positions Google to gather more information about the user’s context. Which is the real problem in current search solutions.

One Ring to rule them all…

Though I’ve always been a tech junkie – from Commodore 64 to OS X, iPods and Apple TV – still, I never really did care about mobile computing… I had my calls and my SMS. I always bought the newest of gadgets, but the newer the phone I got, the harder it was to use… Was it me getting older? Two years in U.S. and I even started preferring leaving voice mails over SMS (maybe because no one used SMS over there)… However, as all junkies, I cannot hold back, I need the newest… so  while transiting through New York some half a year ago I got my iPhone (price was of course no issue). 

What have happened since?

– I discovered location awareness in Amsterdam – accuracy not the point… just gimme the streets… the route to the restaurant.

– I discovered roaming charges in Brussels. Finally I had all my Mail – answering all the time – were did my vacation go really? Does more mail access decrease the size of your inbox? For sure it did not decrease the roaming charges (some 2000 NOK for just checking email).  

– I finally had all my RSS-feeds at hand – but I seems always have 500 unread!

– I started blogging on the train – hmm, maybe I now could count my train rides as work hours? But I did not get more work done.

– Exchange integration means that I always had my calender with me… However now it is seems to always be full… 

– I’m involved in development of a touch screen application. iPhone are setting standards –  generation iPhone will not accept “bad” solutions… 

– On the positive side I do more often now leave my laptop at work, I now book tickets using Safari on the iPod (WAP what was that?). And as any true tech junkie I measure my gadgets coolness factor by how long it stays cool to me… and my iPhone is still very much so – apps apps apps! BUT it does affect my economy more than increased interest rates… I DOOO need a fixed data charge! 

As a final note, for business I think iPhone it is a huge step – work gets more of my time because I allow them to. And for us consumers, I think it improves the usability and usefulness by giving me a PC you can call with not visa versa (and yes I only care about connectivity from the network provider – not so called value-added-services past location awareness ). Similar to what Nokia managed in the 90s, it’s just better than what was. However, it will make all employees attached to work all the time, and as a side note, my next vacation I will be switching to “Airplain mode”! 

I now have one device that rule them all… unfortunately I am feeling more like Gollum than God…

(this post was made on the iPhone)