Have you been in the situation where you’re looking for the best language for conceptual and logical modeling? I’ve been modeling for a while now, and I keep questioning if the languages I use fits the purpose. Quite often I feel it’s the customer’s tools decide the language, unfortunately this also limit the use of the models.

My customers has varied from large enterprises to small solutions, from local models to models suppose to span the enterprise and 10s of systems, and even a small country. There is not likely that we have one language that fits all, but it is interesting to think about the consequences of the choices we make as enterprise information architects.

It is also interesting to reflect over the cause of the so-called 2000-bug. It was clearly a modeling issue, limiting the representation of year to only two characters . We are told that it was due to the cost of storage. Maybe the choice was made because we could buy an extra PC due to the savings there and then? What was the total cost of ownership of the system due to that decision? And, are we repeating the mistakes of the 80s? Are we selecting modeling languages from our current prefferences creating a future cost?

Let me start off by looking at the objective of logical Information.

Where logical information models originally was meant to hide implementation details from the user, the term conceptual models went somewhat further abracting models into human terms and relations. 

Historically objective of information modelling originally tried to model databases (with e.g. Express, EER-models), later object oriented programs (with e.g. UML) and services (with e.g. XSD and WSDL). Later, though the 2000s, we have seen a major shift towards modeling information throughout the enterprise domain.The term Domain Model is sometimes used as a synonym to Logical Information Model when it covers more than a system.

We see that more data is originating from outside the enterprise and Big Data and Linked Data needs to be part of the picture. We also see an increased need of agile information modeling with need for new languages and methods.

Problems arrise when we are using the 80s and 90s languages and tools to model information in the enterprise.

In the next blog post I’m going to try to deep dive into the strength and weaknesses of the older and newer languages, and when they becomes a constraint rather than an opportunity. Later, I will look at the tools and their maturity and capabilities.

Big Data Live fra Open World

Computas er til stede på Oracle Open World i San Francisco og som ifjor er det Cloud og Big Data som er de store ordene. Industrisynserne snakker fortsatt om de store V’ene: Volum, Velocity, Variation og nå også Value. De redefinerer mange kjente trender som deler av Big Data. “Event processing”, “Sensor data”, “Social Software”. Og det oppfattes omtrent like tåkete som det av og til kan være i Fog City.

I teknologi enden av skalaen dreier det meste seg om NoSQL (i realiteten en teknologi basert på Berkely DB fra 80-tallet) og Hadoop MapReduce (med arv tilbake til fra Google og Yahoo). Det dreier seg om Java APIer og installasjon av Oracles NoSQL database, men budskapet her sier lite om hvorfor Big Data er anderledes enn ting vi har gjort før.

En plass imellom her finner du derimot essensen. Big Data omhandler evnen til å samle og lagre all informasjon vi skaper, og benytte den enhetlig. Mens vi før har kun tatt vare på resultatet av en transaksjon tar vi nå vare på alle stegene frem til et kjøp eller avgjørelse. Og vi har for første gang teknologi som tillater oss å ta vare på alle dataene vi kreerer. Det store spørsmålet da er om vi klarer vi å utnytte disse dataene? Klarer vi å generere ny kunnskap basert på disse dataene? Klarer vi å finne mønster? Klarer vi å forstå dataene? Klarer vi å forstå kvaliteten på dataene? Til dette trenger vi gode analyseverktøy.

I Big Data verdenen er det en Data Scientist som skal legge dataene til rette, og svare på disse spørsmålene og det er stor enighet om at behovet her raskt kommer til å raskt overgå tilgjengelig kompetanse. Teknologisk betyr dette ikke bare et stort skifte fra prosesser til data, men også et skifte hvor man bruker et sett av teknikker for å oppnå de ønskede resultatene.

Som en demonstrasjon viste Larry Ellison, CEO i Oracle, Big Data over Twitterspace der 4,9 milliarder tweets samlet over 10 dager ble analysert sammen med strukturert informasjon i sanntid for å besvare et tenkt spørsmål om hvem som er den beste olympier for å promotere Lexus?  Det forholdsvis enkle spørsmålet krever enorm dataprosessering og et sett av teknikker. Dataene ble brutt ned til tilsammen 27 milliarder statements, og en rekke relasjoner og teknikker fra strukturerte elementer av tweets, hashtags, re-tweeting og sentiment analyser ble kjørt på Exalytics og Exadata. Svaret? Jo det var Gabby Douglas, US Gymnast.


Semicolon methods to be used in Sweden

It has been a long time since I’ve posted on this blog. Most work in the recent months has been related to the Semicolon-project (www.semicolon.no). This project is a Norwegian funded innovation project where we’re helping the public sector in Norway in interoperability issues related to sharing information with focus on semantic and leagal issues.

Computas work here has been related to opening up registries like the Norwegian Central Coordinating Register for Legal Entities, The Register of Company Accounts and the Semantics Register for Electronic Services. These has been published as Linked Open Data and the first two even on SPARQL endpoints.

The last year we’ve worked on the “Semicolon method for publishing master data as linked data”. The Swedish Tax Administration (Skatteverket) applied for funding to open data from the Swedish company registeries using «Semicolon-metod» as input. They were crowned with recieving a grant so now we’re also going to help them in opening up company data. Interesting months ahead.


4 advises on becoming more semantic

Over the years I’ve worked on quite a few semantic projects, and I’ve listened to hundreds of presentations, and helt over 50 myself. A few things has become very clear to me in this time

1. The need to know what is important to You! What is the ultimate solution for you! If you collaborate with just one other party – XML is fine! If you have established standards with everyone you work with, use them! If you are in an “accidental collaboration” environment you will need clearer semantics. And if you own public data, release them on the semantic web! YOU have to look at your vision, business model and collaboration patterns!

2. Many, if not most, people talking about semantics have no hands on experience. It never stop being amazed to see and hear people talking about this without understanding where the state-of-the-art is. Vision of the technology’s possibilities is not enough. You also need to know the limitations regarding standards, toolkits maturity, theoretical limitations and applied use cases – and even more important experience for more than toy-examples so you can compare to as-is technology.

3. Know the important differences between approaches. Everyone are now calling what they solution semantic. Chosing wrong may lead you on a side track, it may not fit you vision. Know what the world is doing or you may end up with Minitel while the rest of the world uses the Web.

4. Know the steps to your goal. What steps can be executed now, and what needs more maturity from you, the standards, tools and developers. Know that executing early gives wounds, executing late you may lose an opportunity. Know that you can engineer around some problems, others may be harder.

Short and simple, get informe and know what is right for you!

Sesam4 and Tourism

Sesam4, not to be confused with sesame from Aduna, is a Norwegian research project with participation from Computas, Ovitas, Cyberwatcher, Vestlandsforsking and others.

One of the objectives of the project is to address tourism sector with semantic web technology. The sector is chosen for a number of reasons:
• Large number of stakeholders
• Most of the service providers have small budgets for technology and integration
• Large amount of potentially interesting information, with a substantial precentage public
• Geolocated information
• High visibility

Mostly this means that it is a challenge to create good solutions with traditional technologies, and a great potential for semantic web technologies.

Through the first phases of the project we have defined some use cases, scenarios and identified some key challenges. My company, Computas, is one of the participants in the project. In the next months we will focus on how to apply semantic web technologies in the scenarios. We’re starting this week with an off-site workshop with 30 people. More on the results will be posted later.

Anyone else applying semantic web technologies for tourism?

Semantics in the Norwegian government

The Norwegian Registry Centre, Brønnøysundregistrene, is implementing a metadata repository for configuration of concept, structures and messages exchanged between the private sector and the government.

The project, known as SERES2 has gpne through several phases. It has clearly defined benefits; simplifying and reducing the reporting burden of enterprises, and harmonizing models for reuse. The use cases are also quite clear. Merging existing message exchange models (e.g XSDs) into common models and thereafter producing new message exchange models top-down with clear semantics, structure and under a well defined workflow and configuration management.

The repository, which is a central part of the final system is based on a product from Adaptive Inc. which allows a custom metamodel to be defined, and instances of these to be imported, and versioned in the repository.

First glance at a populated system will be at the Semantic Days conference in Stavanger May 18-20.

The Media Zone

This weekend eight of the semantic web Media Zone project team are gathered in a cabin in Hemsedal, Norway. There is no running water here, which means that we have to heat water on the woodstove and can take no showers. No problem. We do have electricity though, and Television.

I often read trends from the behavior of my friends in situations over time. Of course I’m not assuming we are representative of the general population.

Interrestingly the TV has not been turned on the whole weekend. But we’ve seen a lot of media. On our iPhones and laptops. Our music has been streamed over the 3G network with Spotify. At least eight YouTube videos have been shared.

While I’m blogging about this on my iPhone, Magnus is playing the Norwegian anthem on his iPhone. Daniel is playing Ski Jump on the iPhone instead of trying the one outside the cabin. Frode is creating maps while skiing. Everyone are twittering about what they are doing. And did I mention that Magnus, Daniel and me played poker agains eachother and other facebook people in the car getting here.

What did people do five years ago? Messages to the world would have been sent by SMS, we may have had a guitar and for sure this blog would have been written in the cabin’s guest book. And TV would have been the preferred medium. I actually checked the guest book for March 21, 2004. They were only skiing…

I think I can safely say that in another five years we can still be without running water, but not without the Internet, and we will not notice that the TV is not there. The world is moving so fast that the rest will be guesswork.

Thanks for a great weekend Magnus, Pia, Frode, Daniel, Odd-Wiking, Christian and Robert!