Trees and Hierarchies in SQL

This is my presentation about Trees and Hierarchies in SQL, which I gave at the Code Camp in FIZ Karlsruhe:

Distributed computing the Google way

This is my presentation about Apache Hadoop, which I gave at Java Forum Stuttgart and Herbstcampus 2010:


Why you should not deliver a client for web-services

Today I'd like to explain why a wise service consumer should not ask a service provider for a service client. Before I start with all the technical details, let me describe a sample scenario:
Team A and team B are working on two different applications. Then one day the product owner wants to integrate both applications. Therefore team B has to access the data in the application of team A. The architecture board of the company decides that - if possible - all applications should be integrated using web-services via SOAP over HTTP. So team A implements a web-service and delivers a WSDL file to team B. But after a short time team B bemoans that they cannot access the service and asks team A to deliver a client for their web-service. Does this sound familiar to you?

Usually team B acts that way because of lack of time or simply because they don't have the knowledge how to access a web-service and sometimes it is just politics. If team B tries to argue that team A actually has everything they need to access the service, they often lose the debate. In case of escalation, project managers, team leaders or product owners who don't have any clue about the technology usually decide that team A in their role as service provider have to deliver a working service that includes a service client.

Because of their bad political standing team A usually decides to deliver an easy-to-use client. They generate stub classes from the WSLD file, package them with some kind of web-service framework like CXF and configure the client with the endpoints for different stages. Team B can now easily access the service and in short time that behavior seems to have many benefits for team B:
  • no time investment for implementing the client
  • no need of web-service know-how
  • easy or no configuration of the client
  • client code is in responsibility of service provider
But I think that in a long-term view such a decision is bad for both teams.

The problem is that every service provider is using their own technology and their own service framework. Even if you agree on one framework for web-services, sooner or later the framework will be upgraded to an newer version or replaced by another framework. You can not avoid this, either way any innovation will stop. This means that every web-service client introduces a new set of libraries as dependency and if you decide to go this way you someday you will end in the dependency hell. The following diagram visualizes the issue:



Some people may argue that you can avoid that problem by using OSGi. In my opinion that is just partly right. First, you can not migrate every existing application to OSGi. Second, in some scenarios like deployment with Java-WebStart you need to minimize the size of your application to keep the network traffic down and startup time high. If every service brings its own dependencies, your deployment package will soon be very big. That is why I think that by using OSGi you just hide the problem without solving it. But don't get me wrong: I think that OSGi is a very cool technology that can help you to get your architecture and dependencies straight. In my eyes it's just not the answer to every problem.

I think that dependency management is one of the key aspects to avoid degeneration of your application over time. This dependency management is especially important if your application is part of a distributed system or SOA.

If you are working in a team that uses web-services as service consumer, then don't ask the service provider to deliver a service client. Instead decide for a web-service framework and create client stubs from WSDL files for all of your services. Use your energy and political influence to agree on following points with the service provider:
  • versioning strategy for the service
  • change management process
  • service level agreement
I hope that I could convince you that a wise service consumer should not ask for a service client. What are your experiences with this topic? Did you find yourself in team A, team B or maybe both?

How does your organization handle this issue?


Overview of Continuous Integration Servers

Continuous integration is a software development practice where members of a team integrate their work frequently. Each integration is verified by an automated build to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly. This is an overview of currently available continuous integration servers for Java:
You are missing an important project? Suggest your product by commenting!



    Using a Maven repository as a service repository

    A service repository is probably one of the most important components of a SOA. The service repository provides a single source of information about all services in an enterprise. There are a number of commercial repository products but almost all of them are expensive and complex. Organizations which just started with SOA may not need the full power of a commercial repository and most of its functionality will likely remain unutilized. Furthermore, existing repositories do not do a particularly good job of managing dependencies between service consumers and service providers.

    Unlike monolithic applications, a SOA consists of a large number of frequent changing services. Through orchestration most services depend on other services and these services may rely on some other services in turn. Being able to manage and analyze services dependencies is required in order to be able to implement a robust and maintainable SOA. That is why dependency management must be a key consideration in a SOA.

    Apache Maven has become a de-facto standard in the area of dependency management for building projects in the Java world. Besides that, the dependency mechanism can be used with any Ant build by utilizing Maven Ant dependency tasks. So, there is no need to migrate existing Ant build processes to Maven. Maven also has a pretty sophisticated repository support allowing implementing custom central repositories as well as local repositories.

    Moreover projects like Apache Archiva (see my last post) further enhance Maven repositories with an easy-to-use user interface and add features such as LDAP integration or very fine-grained permission control.

    Why not use a Maven repository for storing artifacts, such as WSDL, schemas and policy files?

    Each service from a service provider can have its own POM file. Using "deploy" goal service providers can publish a WSDL file and other artifacts along with their POM file in a Maven repository. Service consumers could then download these artifacts based on the dependencies defined in the POM file and generate stub classes. In case of an incompatible change, a compile error will appear and service consumers may need to make changes to their implementation. The entire process can be supported by the default Maven build process. The only required change is a custom artifact handler to support new “service” extension and packaging type.

    Additionally an existing change management process can be adapted for services so that service consumers are notified about new versions and their backward compatibility. Most organizations already have a change management process in place to handle changes in shared libraries. Furthermore products like Artifactory can send notifications as soon as a new artifact is deployed in a Maven repository.

    Using Maven for managing service dependencies has multiple benefits:
    • Ability to use full range of Maven dependency management capabilities including version ranges and snapshots.
    • Ability to use Maven dependency reports for dependency analysis.
    • Commercial repository products often provide a proprietary mechanism that can only be used for XML-related artifacts. Using Maven the dependency management is consistent with the dependency management process used for other artifacts, such as JARs, WARs, EARs etc.
    • Tight integration with any Maven or Ant-based build process.
    • There is no need for using UDDI or any other complex and proprietary APIs for publishing services.
    • Service consumers that do not use Maven or Ant could still participate in the process by downloading the artifacts manually using user interface provided by Nexus or a similar product (see my last post).
    So there are many arguments why it makes sense to use a Maven repository as a service repository in a SOA. Let's see how we can realize that in practice:

    I think it's a good idea to separate the data types (XML schema) from the service interface (WSDL file). That way you can reuse your data types in different services. But creating a seperate Maven project for every XML schema file is very fine-grained. Instead I suggest to create only a Maven project for every business domain, which contains many XML schema files compressed in JAR file. I created a sample xsd-project as an example for this approach, which you can download and review the source code in detail.

    Additionally to the XML schema files we need a separate project for the WSDL file. This project needs to reference the XML schema files, so you can import them in your WSDL file. You can do this easily with the maven-dependency-plugin. This plugin can download the JAR-file containing the XML files and unpack XML schema files in a subfolder of your target directory. After you finished editing the WSDL file, you need to package and deploy it in your Maven repository. You can use the build-helper-maven-plugin to append the XML schema files from your target directory to the JAR file. For detailed information, you can download and review the configuration in my sample wsdl-project.

    If you don't want to separate you XML schema files from the WSDL file, then you can keep everything in a single WSDL file and use the "attach-artifact" goal of the build-helper-maven-plugin to deploy the WSDL file instead of a JAR file. That way you can access the WSDL file directly from your Maven repository without the need to unpack JAR file first.

    Finally we need a Maven project that downloads the WSDL file from the Maven repository and generate stub files for client or server implementation. You can use the maven-dependency-plugin once more to download the JAR file and unpack the artifacts. In my sample project I used Apache CXF to generate the stub files. Please download the sample project and review the code for detailed information on how to do that. More information about the cxf-codegen-plugin is available on the Apache CXF website.

    The maven-dependency-plugin enables you to analyze the dependencies of your projects. For example you can display the dependency tree for your project with the command "mvn dependency:tree".

    Another very useful plugin is the maven-version-plugin. With this plugin you can display all dependencies that have newer versions available. Just use the following command: "mvn versions:display-dependency-updates".

    Of course you can use Maven XDOC to document your service and generate a documentation in HTML using the maven-site-plugin. And I think you can find many other useful plugins that can help you to manage and analyze you services and dependencies.

    As a side note, I'd like to mention that the Maven repository contains only releases and snapshots of your services. It does not contain the different versions of your file during the development. For this reason it is a good idea to combine your SOA Maven repository with a versioning system like Subversion where you can store and compare your development versions before releasing a service interface.

    I suggest to combine your versioning system and Maven repository with a continuous integration system like Hudson. The Artifactory Maven repository for example allows a very tight integration with Hudson using a plugin. That way you can deploy your build artifacts from your continuous integration server into Artifactory together with build environment information captured at deployment time and obtain a fully-reproducible build.

    Last but not least I'd like to mention that some Maven repository products like Artifactory allow you to tag your artifacts and add additional properties. This enables you to store additional meta-information and use this information in your governance process.

    Of course the Maven-based approach does not do anything for managing and enforcing (including run-time enforcement) of service policies which is something that commercial registry/repository products can do quite well. So organizations need to evaluate the need for a commercial repository based on types and complexity of the policies they would like to enforce. In many if not most cases the policies are quite straightforward and the Maven-based solution becomes a viable option.

    I hope I was able to demonstrate why a Maven repository is predestined to be used as a service repository in a SOA. I'm very interested in your feedback and your experiences with service repositories or registries.

    How do you manage your services?


    Overview of Maven repository solutions

    Apache Maven is a popular build tool for Java projects. One of the main benefits of Apache Maven is that it helps to maintain the dependencies required to build an application. Apache Maven suggests to store all software libraries in a central repository. Unfortunately the synchronization with the public repository is slow, unreliable and sometimes the latest versions of some libraries are missing. Furthermore private libraries cannot be uploaded there either. By setting up an internal Maven repository, an organization can avail of the benefits of a Maven repository and bypass some of the shortcomings of the public repository. This is an overview of currently available Maven repository products:
    You are missing an important project? Suggest your product by commenting!



    IT job trends - Which technologies you should learn next

    Recently a co-worker showed me a meta-search engine for jobs. Indeed.com is a search engine for jobs, allowing job seekers to find jobs posted on thousands of company career sites and job boards. But for me the most interesting part on the Indeed website is the posibility to analyze trends on the job market. I tried to analyze some of the data and got some interesting results.

    EAI vs. SOA
    For example, since 2006 the market needs more SOA experts than EAI experts. That is not a suprise. We can also see that since the end of 2008 the need for SOA experts stopped growing and we can see the next trend "Cloud Computing". Is it a sign that the topic is shifting or is it just coincidence?


    Next Hype: Cloud Computing
    We can see the growing "Cloud Computing" trend even better if we view the percentage of growth relative to each other.

    EAI, SOA, Cloud Computing Job Trends graph


    Java vs. .NET
    Maybe you ask yourself which computer language you should learn next. Should you learn Groovy, Ruby, Scala or take a look on Microsoft .NET? Well, according to indeed.com Java is the most wanted programming language on the market. C# is growing year by year but still does not seem to be used as much as C++. Groovy, Ruby, Scala are currently not used in the field at all.

    Are Groovy and Scala just a hype?
    But the relative growth of Groovy, Ruby and Scala over the past years is very impressive. Especially Groovy and Scala are growing very fast. Is it just a hype or are these the programming languages of tomorrow?

    Lightweight J2EE architectures are IN. EJB is OUT.
    If you compare EJB with Spring you can see that companies search for more Spring developers than EJB developers. Even if you compare Spring with J2EE - which is a more generic term - you can see that the numbers are nearly the same.

    Much more impressive is the relative growth of Spring over the past years. And the number of open jobs for Spring developers is still growing...

    Hibernate vs. EJB
    If you compare the numbers for Hibernate and EJB (or other persistence frameworks), you can see that Hibernate won the battle of persistence frameworks.
    EJB, Hibernate, JPA Job Trends graph


    But if you look at the relative growth, you can see that the future lies in JPA. By the way: Hibernate supports JPA!

    JSP and Struts still rock the world.
    And what about web frameworks? Well, JSP and Struts developers are still the most wanted. JSF on the other side still needs some time to become a real standard.

    Maybe GWT is the future...
    None of the web frameworks is growing as fast as GWT. Is this the future of web development?

    JBoss is the only open-source application server used in production
    What about Java Application Servers? Oracle Application Server is the most wanted, but JBoss is growing fast. It seems that open-source application servers like GlassFish or Geronimo are not used in production.

    Oracle Application Server, WebSphere Application Server, JBoss, GlassFish, Geronimo Job Trends graph


    But watching the relative growth, GlassFish seems to be the next superstar.

    Tomcat is the dominating web container
    Comparing Java web containers, we can see that Tomcat is dominating the market.

    But Jetty is growing much faster as any other web container.

    MQSeries is losing against TIBCO EMS.
    Looking at some popular JMS-Servers, we can see that TIBCO EMS is growing fast and MQSeries seems to lose the game.

    Is ActiveMQ the future of JMS-Servers?
    But ActiveMQ is growing faster than TIBCO and if it continues that way it can be the future most wanted JMS-Server.

    MQSeries, TIBCO EMS, ActiveMQ Job Trends graph

    Maven vs. Ant
    After comparing Maven with Ant, I was shocked that many companies still use Ant as the primary build tool.

    But I was relieved after taking a look on the relative growth. It shouldn't take long until Maven wins the game.

    Continuum and Hudson seem to be the most used continuous integration servers.

    TeamCity is growing fast, followed by Anthill.

    And unsurprisingly, Eclipse is the most used IDE.

    Swing vs. SWT
    If you are a Java rich client developer, you should know how to develop with Swing. SWT does not seem to be a real threat to Swing.


    Flash vs. Silverlight
    No surprise: Flash does not only dominate the web, but the job market, too. Though Silverlight is growing...
    Flash, Silverlight Job Trends graph

    And Silverlight's growth on the job market is very impressive:

    Flash, Silverlight Job Trends graph



    Conclusion
    According to indeed.com you are currently the most valuable IT ressource if you are a Java Developer with Spring and Hibernate knowledge. You should know how to develop web applications with JSP, Struts or Web Flow and how to deploy it on Tomcat or Oracle Application Servers. If you are a rich client developer, you should be able to develop with Swing or Flash. And if integration is your job, you should be familar with SOA and TIBCO EMS. It's always a plus if you know how to use Ant and how to make continuous integration of your build with Continuum.

    But if you want be the the elite of the future, you should take a look at Cloud Computing, Groovy, Spring, JPA, GWT, GlassFish, Jetty, ActiveMQ, maybe Silverlight and of course Maven.

    What do you think? Which IT skills will be the most valuable ones on the job market of the future?




    UPDATE: Colin pointed out to me that I should use quotes when searching for "Oracle Application Server" or "WebSphere Application Server" - which results in kind of a different picture.