For the sake of software

Wednesday, October 23, 2013

Application architecture - a general thought flow

Slightly deviating from semantic web, I would like to talk about application architecture.

To my mind, architecture always resembles structuring the concerns. Whose concerns? Concerns of the stake holders or people who have a vision of what they want. When you build a house you have a idea of what all you need. Your wife has something else. Your kids add something more and so on. The architect has to find a balance of your concerns, create a structure on which your concerns are addressed. This means a building plan where each of your concerns are addressed. Not only that there is a certain elegance when you look at the plan in which the concerns are balanced out with a sense of aesthetics, symmetry etc. All of these are well balanced with the constraints that are there. For example, cost could be a constraint or a need to have a maintainable flooring could be a concern. Based on that the choice of colors or the ventilation and lighting, minimizing wastage, increasing utility and so on.

Balancing out the concerns, constraints, elegance on a structure is the architecture.

If it is software, the concerns are different. It could be like a sponsor saying 'I need to handle 5 million subscribers with a sub-second latency when they access a certain feature'. Or it could be like a 'highly usable interface that talks the native language of the user'. Or it could be even a schedule or cost input that says ' I need this before christmas' when you are in June. That may severely constrain the choices you have.

As you see concerns are something that a stakeholder feels a feature to be important. But further discussions may relax them or you may be able to find the range. In the example above, on discussing you may find out that the sub-second latency can be up to 3 seconds and it is still fine for the stakeholder as you may point out that the hardware cost is higher if it is kept under 1 second.

Structuring the concerns and constraints into software systems, components is a art. As there is no single right way. If you structure your application in a way it allows greater flexibility, for example changes to the data model should be done in a jiffy with configuration without any coding is something desirable. But this will mean the components involved are coded in a data agnostic way. The domain model is super-imposed on the application rather than wired within. Allowing such a flexibility may not often give the greatest performance. Thus at a architecture level, you may allow a hybrid approach wherein based on the type of transaction and the type of client accessing the system, it may be able to use the high performant route which is a more tightly coupled, specific technology stack based implementation, though it may still allow adding on a generic loosely coupled configurable implementation by the side for other types of traffic.

What if we had allowed only the high performant closed model? It may fly for the type of traffic we envisioned, yet it may increase the cost of maintenance when changes happen or some times it may not be possible to even introduce any changes. It may require a rewrite.

Thus when all concerns are brought in, the system moves in different directions. Architecture provides the space for meeting all the concerns in a way it is agreeable to the product visionaries at the same time keeps the future growth or changes in mind.

Architecture brings in malleability to the application when it is hammered with new possibilities. No rewrite or very less rewrite is needed. Only additions at the right places already provided will be needed. A good architecture cannot be there without a futuristic visualization of the requirements of an application. Architecture comes out with a lot of brainstorming rather than a single individual blowing it out. But the single individual who dons the role of the architect needs to be able to see the big picture and be able to steer the direction without getting lost in unwanted concerns. The architect should allow the different possibilities to exist but at the same time not lose track of the end goal. End goals can keep changing and hence the architect should keep the malleability to remain and not get lost during implementation to handle changes.

Similarly architecture cannot be done without understanding the technology, software and code. It is very intimately related to what happens underneath. A large number of products do not take off from the shelf. Very few meet the customer and very few among them are actually used finally. Given this, it is important for the architect to bring focus on the key concerns rather than floating on imaginary aspects. This goes against what I said above about visualizing requirements. It is important to visualize, but it is equally important to make it practical based on the ecosystem. If you have a poorly skilled team, visualizing very broad themes, drawing the architecture and realizing them in a implementation would be a hard problem. Or if you have a sponsor who does not value the architecture and instead who is driving the short-term goals would pose a bigger problem to run a clean architecture based implementation. For architecture to bloom, the sponsor should be a technology person at heart who knows the value of a solid architecture. It is a call taken based on the commitment to make a product and take it to market and expand on it.

Wednesday, August 7, 2013

HP pavillion sleekbook b172TX, Windows 8 and user experience

Recently bought this laptop.
Did not really need a touch based laptop, but ended up buying it.

For reading documents or power point slides , the touch is helping more than I thought. I am able to pinch and zoom , roll over pages easily. I see the touch mainly as a mouse based screen and a little more. Gaming and paint brush kind of applications will be good to use , though I have not and may not.

The display is good but not as brilliant as my Macbook pro.

The HP laptops as I see the 'Envy' series seem to strike a middle ground between the purely tablet and purely laptop space with the touch and the key board. By having both, with less weight, and smaller screen sizes, you have a better world to manage. Especially if you are the coding kinds who keeps typing away or even the document kinds. I still don't see the soft key board on the screen as a friendly thing for typing docs or power points. One good thing about this HP laptop is it does not get heated up. The key board is sleek and similar to the Macs. I guess HP is towing the Samsung way in trying to take the Apple's position on their laptops. However, I would still any day prefer a Mac for the integrated and thoughtful UI experience.

Coming to Windows 8, Its definitely a try I would say to get into a tile based UI. But alas, you still have the old windows desktop which keeps popping windows when you do something on a tile app. For example, when I use the tile bing and search youtube and click, it goes to the Internet explorer window to open youtube. Crazy!
Same when I use the Office 2013 power point, it seem to have a fresh new interface showing the file properties and all metadata. However when I click Save As , it opens up a window which is the old browse window to save your files. The experience is divided between the new interface and old interface. I get annoyed with the windows popping up experience after I started using Mac which has managed it more as Apps and it still has some minor pop-ups, but not in the scale of Windows.
The workspace concept by swiping your finger from left of the screen to switch across the different apps you are using (similar to alt tab) is good.
But I feel Microsoft should move away from their pop up windows and all changes to a screen should happen within the screen. I know they have a lot of smart engineers who can do this.

Sunday, October 7, 2012

schema , data and semantic web

If you did not do a computer science course and specifically databases, it is unlikely you will know the term 'schema'. While many of us , even people of non-computer science background may be able to tell what a 'data' is.

What is the difference and why does it matter?

Data is all about values of some thing. For example, when some one asks your height, you may say 170 cms. The 170 cms is Data. While the tag or the name given to that value identifies what that value is in a myriad of other values such as the length of your sofa which is also 170 cms. If you need to differentiate the values or classify them as some thing meaningful, you need to have a additional tag that describes what those values stand for. So simple isn't it?

Now I hear you telling that you knew this and you have been sending emails to your business partner who does Tshirts for you about the length and breadth of a Tshirt in cms. Yes, you have been using it implicitly, but a computer program which is written to perform some checks, say a check that tells the width of the Tshirt cannot be more than 100 cms, will have to know to use the correct 'name' to make this comparison if it has to work across different values of the width of different Tshirts. Otherwise the program will be hardcoded to look for only 100 and it will not be a program that works for other dimensions.

Many a time in human communications, the schema is untold and left to the reader to decipher. For example, I may say to my friend, 'let us meet at the plaza to watch 'My cousin Vinny' at 7'O'clock evening'. In this, there is a lot of data such as

plaza
My cousin Vinny
7 pm

Now, as you can see all the above are data points in this statement. However, in order to allow a machine to process this, it has to go beyond the values and be able to add tags to this to describe what this is about or the semantics of the information. Thus additional tags on the above would be

plaza - theatre

My cousin vinny - movie

7 pm - time

Now we bring the schema to these statements by additional tags. plaza is about a theatre and My cousin vinny is a movie. This kind of interpretation of the key elements in this statement helps a computer software to answer queries like 'what is the name of the movie?' or 'which theatre is being talked about' or in general even span across all statements which has theatre in it to find out things like how many web pages have 'plaza' the theatre specified and how many of them have a statement that relates plaza to My cousin vinny.

But you may wonder how on earth is it going to be possible to tag every statement, every word of what we speak and especially the world wide web. Well, to answer this, most of the web pages today have 'data' that represents government information or companies or people or others. They are all currently published from databases or even excel sheets. All of them have very rigid schema. But in the process of getting them into HTML, the schema got missed out.

Now, all this means is to have tools that allow these additional aspects to be still maintained in the process of a HTML publishing.

This is fine, but how about Wikipedia like pages which has lot of textual content for human consumption?

There are efforts like DBPedia which tries to derive automatically the semantic information represented by Wiki pages. Hence, it would not be difficult to bring back the schema of the wiki pages.

This is obviously an effort and a large one. But it is happening and soon you may find the web of text look like a web of data.

Tuesday, October 2, 2012

We are connected by data

Behind every social and business interaction there is data. To understand this statement, let me look at some examples.

You and your purchases

When you buy something, apart from the amount and shop's name, address and phone number etc., there is also the warranty information, maintenance contract, service centre phone numbers, if it is a EMI to be paid, then the reminders to ensure you pay properly, if warranty expires and you want to extend it, then the dates etc., if you get any free coupons, then the details of it, if you bought it as a gift for some one, then the details of that person, if you need to ship it somewhere, then the address and phone numbers, the tracking details until the goods arrived at a place...and so on.

You and your bank

In this case, sure, the bank maintains most of the details on your transactions and offers a monthly statement to you or a online statement. But when you just issue a cheque to some one, the reason why you issued the cheque or when you receive, the reason why you received the money is known only to you. The details of a credit or debit card transaction is clearly not something machine readable. For example, if I buy a laptop from Apple store, the Apple store detail is there but not the fact that it is a laptop.

Now with these simple examples, you can see the connection. What you bought and some additional details are available in the first one (with the shop) and the details of all transactions you did, not just with this shop, but with all others with other instruments (cheque) are available with the bank.

As a individual I would definitely benefit if both the above data are linked and thus makes sense to me. However, how do we make this happen? And how less painful this can be for the end user?

Semantic web is one answer to this problem. When I say semantic web, I mean standards like RDF Linked data allows to specify such linkages provided vocabulary for the above data representations are available. But beyond that every shop and every bank may have to specify using this. I feel at least the online e-commerce portals can start returning such information as a RDF/XML which can be reconciled with the banks, thus allowing a method of getting the details of all your spending automatically.

Imagine how useful this is for paying my taxes..

If the Tax department can simply accept such a format of expenses and the total of it can be shown against my income (assuming you are self-employed), then will it not save a lot of energy for everyone and of course a lot of paper and lot of tracking ?

The beauty of such linkages is that I can simply run through such a data like a breeze and look for any type of spending I did on any category, may be the shop owner can offer more discounts as the loyalty information is there up front and Income tax can reward people by offering discounts as the data is clean and available for scrutiny much more easily than ever saving lots of $$$ on being able to have more efficient tax collection mechanisms.

That's the power of linked data.

This just the surface...and if it happens it can change your and all our lives for ever.

Thursday, September 20, 2012

What is exactly achieved by Semantic web?

The web is currently filled with documents. There are reams of English text that can be consumed only by humans. Blogs like this add to the ever increasing pile of text content. Of course there are also other types of content like photos, images, videos and so on. Thus the web is increasingly becoming a way of publishing content mainly for human consumption. The interesting aspect of these documents are they are linked to one another meaningfully enabling a user to traverse those hyper links and read all the linked content. For example, I point here the link to the W3C Semantic Web project W3C Semantic Web. Thus there is no need to repeat what one has already published and instead

All this is good and we could have lived like this happily. Then came Tim Berners Lee, the original inventor of the web. He saw that the web of documents is having a large amount of data that includes not just fancy content, but dates and numbers and text and currencies and you name it. It appeared like if we could process this data, we can gain insight into a treasure trove of data that is on the public web.

Now to achieve this, the web pages should be published with additional information or the semantics or the meaning of what is there in the content of a page. This meaning or semantics could be seen as tags that extend the existing information of the content of a web page. For example, there could be a string in the page which tells the name of the author of this web blog as 'Thalapathy'. There could be other things that can be tagged to denote the date on the page as the date on which the blog was written. There could be tags that denote the comments on the web blog, the dates and so on. And there can be tags that tell that the page is about 'Semantic Web'. Thus there can be innumerable pieces of data within a page that denotes a lot more additional semantics that a program can query on.

If we make parallels to the database world, this is about looking at the whole web as one large database.

Query can be done the way a SQL is done on relational tables. This allows connecting disparate data across the web across several web pages to be able to answer a question. For example, the fact that a event reported in Bangalore on a Semantic Web conference can be related to a book released in California and its popularity from customer reviews in Amazon can be connected because the author who wrote that book attended the conference and the book is sold on Amazon which in turn gives the reviews on it. This is not something that can be achieved with a simple Google search. It requires data to be related across seemingly disparate pieces of knowledge.

Thus Semantic Web opens up a whole lot of possibilities in humans and machines on behalf of them being able to see the web as a extended human consciousness offering answers to what otherwise would have looked an impossibility.

Friday, September 14, 2012

Degree of structure to consider for organizing data

Let us examine below the degree of structure that exists in data exchanges between humans with examples and what it has got to do with organizing data.

Unstructured Data

Mostly English text in blogs, word documents, Emails, Web pages. Only humans can make sense of
this. NLP tools to some extent.

Unstructured Data with annotations

English text in a word document or a web page with a given name, the paragraph headers and others
could add more meaning to a human reader than just plain text that does not have annotations.

Semi-structured data

Data used in business context. For example, in emails exchanged as part of a business transaction
there can be something like this

Order ID: 1234
Order Date: 1/1/2012
Quantity: 5000 cps
Price per piece: 10 USD

The above data is more structured , however, the structure is more discernible to humans than machines. However, humans can interpret them differently as well making it lean towards unstructured. In a way, it depends on the context of interpretation.

Excel sheet data also falls in this category. Though I would say it is little more structured due to the visual grid that is used to organize the data. Hence it is more rigid than the above arrangement.

Your bank statement will fall in this category. It is a report, though generated from a highly structured database, is more meant for human consumption.

Structured data

I place XML in this category which is structured as XML conforms to a XSD (XML Schema Definition). XML is meant for data exchange between machines. Though XML expressed in ASCII text can still be read by humans. Hence I keep XML in a category of structured data but not as a highly structured data as defined below.

Highly structured data

In this case, I mean a proper database which requires special skill of data modeling to define the data and relations. This is more used by machines.

Another example is a LDAP. All of these require a pre-arranged data model expressed in a schema language.

Semantic web

This adds a layer of meta data to existing web pages to enable a machine to make sense of the content automatically. However, the expression of this meta data is highly structured. Though the data itself can be unstructured. Thus, this has a unique property of being highly unstructured to highly structured all in one go. For example, the RDFS is highly structured which represents the ontology or the meaning while the RDF itself represents the information which represents facts of the world.

In summary,

It is clear from the above that, the more structured data is, it can be easily interpreted by machines while the less structured it is, the data is meant for human consumption. The key point is, even for humans we end up having some syntactical and metadata level aspects to make things more clear without calling the metadata explicitly as metadata. If we explicitly call out or isolate the metadata from the data, then it becomes more usable by machines and in turn more useful for humans as well.

Coming to what all of this has got to do with organizing data, it is increasingly clear from the above that the better meta data (data that describes what the actual data has) is available separate from the data as in the case of a semantic web or linked data concept, the better it becomes for both machine and human consumption as better analysis of data can be done and more insights can be obtained using the metadata by the machines and ultimately data by the humans. If data and metadata are placed together inter-woven, then it can only be interpreted by machines like a relational database.

Wednesday, September 12, 2012

Organizing data

There is data everywhere today...more obvious and more hitting than before with the world wide web and smart phones. I remember when I used to be a unix and C programmer in early nineties we used to use unix programs like chat and email. Our only interaction with computers were to write some C code. None of my relatives, parents or most of my friends were even remotely using a PC. Mobiles were non-existent.

Now, people download smartphone apps to organize their to-do-list, their contacts and even their jewellery collection. Organizing things is not something new to humans. The entropy increases with time. Organizing is a discipline. A free will hates discipline. I organize my things in my house only when it is absolutely needed. When I file my tax returns I run around for the proofs and letters and documents and so on. During the year when a tax related paper is received, I dump it in a bin. And when the bin overflows I put it in a file meant for income tax. Often I dont have a file for a specific category. When I have my stock report from my broker, I dont have a file for stocks, so I file it in something called personal finance. My home loan repayment certificate, I put it in home related file. Then often I correlate my tax return to the home loan across these files. Linking pieces of information in physical form is not that easy. I had done the right thing to keep these things separate. But I do need to have some kind of linking between them to relate them so that when I file tax returns, I get to know I also need to accomodate my home loan. But the home related file contains several other information like home maintenance expenses and so on.
Being self-employed is even more complex where you need to track all your expenses methodically to apportion them between business and personal for claiming tax exemption.
Running a small business may be even more complex when you have several interfaces with external vendors and partners and so on
Running a enterprise...?

And leaving all these serious data behind, what about my blogs? what about the terms I searched for , the documents I read and the books I have? What about the emails I sent and received? What about the facebook likes and linkedin updates I did? What about the spreadsheets I have in Google docs or the slides in Slideshare? What about all the photos I took that are lying on my laptop and phones?

Should we even bother about organizing all of these data? Yes for several reasons.

Imagine you are at a store and you need a copy of your passport to buy something..
You need to know the how much you spent this month on fuel
You need to find out if you already have the book titled 'Organizing data for dummies' before you make another purchase
You need to know the total money you made on consulting for some one
Or even the home address of your friend you plan to visit

Fundamentally, in today's world computers have taken over and increasingly becoming so. And they all keep and process information. Wherever you go, some form of data is needed to input into these machines and software to get more information or to do more stuff. That's broadly a case why it is important to think about organizing your data.

Also, if you need to share some information, you need to be able to find it.

Plus add to this, the amount of information assets that are increasing with you by day. The internet pages you read, the e-books, the photos, movies, audio and so on.

You cannot hold your brain in sanity going forward with the sort of information explosion that is round the corner.