Live! From the conference!

Mark Logic User Conference 2009 Blog

Matching Businesses with Consumers at the Leading Directory Service in the UK

An entertaining presenter, Graeme Seaton from Yell.com delivered an interesting combination presentation and demonstration entitled “Matching Businesses with Consumers at the Leading Directory Service in the UK”. He provided the audience with an overview of how his firm discovered MarkLogic Server, how they ended up selecting the company to provide their underlying content framework, and showed the audience how the service works.

[Note: This is a very new implementation of MarkLogic and the company is getting ready to launch a new update soon that will take better advantage of geo-location-based data.]

“We’re all about saving time and preventing our customers from trying to think too much,” Seaton said. “Our aim is to be helpful.”

Yell.com is part search engine, part contextual content matching engine. It’s also a resource for research that leverages multiple content types and aggregates and displays them for the user based on the data from their search query and other information (like postal code). Results provide the usual Google-like listings, but also includes suggested — and additional useful services — including the ability for the user to telephone the listing they retrieve from their search toll-free, from their web browser.

The presentation explored the technical and business aspects of the Yell.com service and covered briefly how Yell.com services advertisers (including some interesting reporting features).

Look for dramatic changes in the search arena. It’s a very exciting arena that will have major impacts on the way people use the internet and on how devices that rely on the web may be able to assist us in the future.

Bridging the gap in Clinical Documentation by Applying XML and CDA

A really interesting breakout session was presented by Craig Wilkins of Webmedx. His topic, “Bridging the gap in clinical documentation by applying XML and CDA”, which was designed to help attendees understand three things:

  1. Everyone (patients, payors, physicians, public health organizations, etc.) all need better access to information
  2. Documents are the physicians preferred vehicle (60% of physician documents are dictated and transcribed)
  3. XML and Clinical Documentation Architecture (CDA) are the best paths toward adopting a better data accessibility and management approach

Wilkins shared some statistics to help the audience understand the problems impact in access to healthcare information. He also provided some context on the size of the problem by sharing these numbers:

  • U.S. healthcare spend is $2.5 trillion per year
  • Total U.S. healthcare spend is nearly 20% of GDP
  • %75 of healthcare spend in U.S. is spent on chronic care
  • 18,000 billing codes for procedures, but not one for cures

He provided a great overview of the history of physician-generated documentation. For instance, 60% of all clinical information is handwritten or dictated/transcribed, delaying access to the content because of unnecessary steps slowing the process down (scaning, OCR, QA, etc.)

Like other industries, the healthcare arena is adopting content standards. In this case, Wilkins said, “the standard that provides the most opportunity to help us overcome these challenges is an open XML standard known as Clinical Documentation Architecture (CDA).” The standard helps tackle challenges of importance to the providers of healthcare services as well as to the patients. Separating content from its formatting can making it discoverable means that everyone involved can have real-time access to content they need from portable devices.

He also discussed electronic personal health record management systems that allow patients to control their own healthcare data. These services provide patients wit the ability to record their own information, repurpose information (through syndication) from providers, and integrate with other healthcare-related services, some of which may become available in the future.

Wilkins showed a few very specific examples of recent projects on which his firm has worked that utilized the MarkLogic Server, one of which targets improvements in the transcription quality process.

Document accuracy - Data validation

    • Compared unstructured age data (narrative provided by the doctor) and the birthdate data collected on the patient data form to ensure they are in synch
    • Medication section of documents includes dosage of medicine prescribed, so the system now compares the dosage prescribed against a database of information on typical dosage suggestions
    • Lab value recommendations are compared from the care plan and laboratory results to ensure there are no discrepencies

Wilkins topic was a big one and he had so much content to cover but not really enough time to cover everything tin he way in which I belive he’d like to. Nevertheless, he did a good job showing the types of healthcare-specific solutions his firm has worked on. And, he made it clear that this is just the begining of a new rera in which patients and physicians alike will be empowered by fast access to healthcare content. And, by taking the MarkLogic approach, healthcare organizations can quickly and less expensively provide more meaningful results. Now you don’t need a herd of programmers or a CMS system that requires hundred of thousands of dollars in customizations and maintenance to start analyzing, sorting, comparing, and acting upon content of various types (both unstructured and structured) in myriad formats, quickly, safely, and innovatively.

Learn more about Webmedx.

Learn more about markLogic healthcare solutions.

Multi-Dimensional Content: Enabling Opportunities and Revenue

The closing keynote address at MarkLogic User Conference 2009 was delivered by Stephen Arnold, president of ArnoldIT, whose session Multi-Dimensional Content: Enabling Opportunities and Revenue was attended by a packed house. Arnold, IT guru and author of the numerous books, including his latest, “Google: The Digital Gutenberg”, was a riot. His presentation style was delightful and made it easy for the audience to understand the concepts and thought-provoking ideas he proposed.

Arnold told the story of supermodel Tyra Banks recent call for short stature models (for a reality TV show) in NYC. As it turns out, the city police department was called in for crowd control because so many wanna-be models flooded the streets surrounding the venue in which auditions were being held that a traffic jam and a melee ensured. How did so many people find out about the auditions? Twitter. Now the NYPD is using Twitter to monitor local events — especially those of the flashmob type.

His point was that tools like MarkMail could easily be used by organiztions to filter through and anlalyze real-time information streams like those coming out of Twitter. Doing so, he said, would provide users with knowledge they are not able to gain in any other practical way. Such filtering can help organizations make intelligent business decisions based on actionable data.

Arnold told the audience that he believes that JetBlue, the only profitable airline in the industry currently, made a smart decision when they created a “flexible framework” using MarkLogic Server, relying on existing - familiar - tools instead of trying to teach people lots of new skills. Flexible frameworks are the future of information technology and the foundation on which revenue-generating activities will be built from now on.

“I don’t want to learn how to do new stuff. You don’t either,” Arnold said. What’s needed, he continued, are solutions that help individuals find what they want in the “best possible way” — not in a perfect way, but in the best possible one. Smart companies know this. That’s why they are adopting the framework used by MarkLogic to tackle information challenges.

“I am a strong supporter of the MarkLogic approach,” Arnold told the attendees. “Not because they asked me to be, but because I see the practical applications for their technology.”

“Indecision is the bad thing,” Arnold said. “If you wait (to make a change) and the shift begins, you have little warning to adapt or move.”

Content, Community, and Agile Transformations at BusinessWeek

Isaac Sacolick of BusinessWeek (The McGraw-Hill Company) presented  a 50-minute session this afternoon entitled “Content, Community, and Agile Transformations at BusinessWeek”. The presentation focused on a brand new service provided by BusinessWeek called BusinessExchange. The purpose of the site (which is really not a fair description, it’s more of a service) is to provide users with the next generation of content publishing, provide targeted advertising services for sponsors, and to help keep BusinessWeek relevant as the publishing world tries to find its feet.

It’s a new model for a mainstream business magazine, one that I’ve been evangelizing for years. It merges the features of social media with the world of publishing. It breaks the old school model and shreds it to bits.

First, it plays nice with others. Users can import content from other services. For example, provide your login information from other services like Linkedin and benefit from automatic content population. In other words, content from third-parties is syndicated and repurposed in BusinessExchange automatically. No need to rekey all the same information that you’ve already typed in previously on another site. Changes made to Linkedin and other third-party content is automatically updated.

Second, it respects your interests. BusinessExchange asks you about yourself, about your interests, your friends,  and then customizes your experience based on what the site knows about you. It uses information provided by you, as well as information coming from third parties, other members, and you online usage habits to personalize your experience.

Third, it leverages the power of social networks. And, in doing so, it provides powerful new ways for you to meet new people, find new clients, get new business, promote yourself, your charity, whatever. All while you are interacting with the site. For instance, if you provide a “reaction” (akin to a blog comment) to an article on the site, the service can create a Twitter Tweet for you, driving more folks back to the original article.

There’s a lot more to say about BusinessExchange — and they’re adding new features in the future — but while writing this article, I created an account, imported my Linkedin data, and started exploring. I recommend this approach for you, too. After all, sometimes, it’s better to see and experience something than to to read about it. This is one of those cases.

Other publishers take note. BusinessWeek has set the bar very high and, if you ask me, it’s well-positioned to beat the pants off the competition.

Jet Blue - One Year Later and a Year Wiser

Just after lunch today, Murry Christensen and Chris Beckman of Jet Blue delivered a 50-minute presentation entitled “A Year Later and a Year Wiser: Lessons Learned from Implementing an Authoring and Delivery System Based on MarkLogic, Sharepoint and Word”. The popular and innovative airline was working toward creating a modular content-oriented, structured content creation, management and delivery system — and methodology — that would help the airline better manage the intellectual property of the organization and meet regulatory mandates (like ATOS requirements), where appropriate.

Jet Blue had all the typical challenges: the need for various employees with different skillsets to create, edit, and deliver content; collaborate on projects; access the content offline; reuse content, etc. They also had a number of industry-specific / regulatory requirements to meet or exceed, as well as corporate requirements and organizational needs.

This situation led to the new system being developed, using a decentralized content creation model. Authoring templates in Microsoft Word helped enforce style and structure, a metadata manager allowed users to enhance the content with rich metadata, and an impact functionality panel provides users with the ability to track content resue. The impact feature is very useful as it can help users understand how their changes may impact content repurposed elsewhere. The new system also provided multi-channel publishing capabilities and content management functionality (things like change tracking, revision control, workflow).

“The point is, when someone asks for a piece of content, they get the most current version of it,” Christensen said. “When you’re in the airline business, flying folks around in the air in large aluminum tube, there’s no room for error.”

Chris Beckman, Manager, Corporate Publications for the airline provided a technolgical overview of the system and shared some best practices and lessons learned.

One critical piece of funtionality that helped the airline meet its regulatory requirements is the ability to map (via cross-references) procedures to the regulations that govern them.

One interesting thing about these presenters is that they weren’t afraid to tell you what they did wrong, what they would do better next time, and what challenges they had. One challenge that some readers may recognize from their own experience revolved around the changes Microsoft made to the graphic user interface in Word. Such changes, while a welcome remodel for some, caused some users to become paralyzed. With some additional training, JetBlue was able to address the problem and get users back into their comfort zones.

Information Fusion: Realizing Cost Benefits and Gains in Efficiency for Government

Guy Filippelli and Jeremy Glesner, both of Berico Technologies, shared several government case studies and made some compelling arguments for using MarkLogic to build solutions designed to tackle information analysis challenges. The case studies were well-thought out and despite the presenters not being able to tell us all of the details (due to the sensitive nature of the solutions they built), the session was very informative.

The guys started the case study portion of the presentation by defining “information fusion: The aggregation of information, regardless of type, to provide useful views of the information in meaningful ways.

The first case study - The relational approach

The Berico team built a system that was designed to be user-friendly and to display data in meaningful ways to personnel seaching for the bad guys in the war in the Middle East. Unfortunately, the tool was so successful that users started bringing the team dozens of new types of content in myriad formats.

While the success of the solution was viewed as a victory, it also created a new challenge: lots of additional work. The developers did not want to loose the data being provided to them, but were struggling to model the content to get it into the relational fields (columns and rows) the system relied upon. The project quickly overwhelmed the team because each new data format created a large amount of work to ready the content for analysis.

[To learn more about the reasons why relational database are not the best tool for these jobs, read Endless Possibilities: Norm Walsh on the Changing Nature of Publishing.]

The second case study - The MarkLogic Server approach

A similar project, utilizing data from a large number of data sources in a variety of formats, was launched using MarkLogic Server. The immediate and big difference is that the project relied on XQuery (and XML). This approach drastically reduced the skill sets required by the team to deliver a working solution and created a major reduction in workload due to the fact that MarkLogic Server can handle the data in its original format, without the need for a cadre of database programmers with special, expensive, and scarce talents to wrangle it into submission.

Because the content did not need to be modeled as it did in the relational database example, the team could focus their efforts on enriching the content, improving its accuracy, and developing additional innovations.

The final word: Total cost of ownership

The presenters made it crystal clear that it’s less expensive to purchase MarkLogic and use it to develop government data analysis solutions than to try and tackle the same challenges with a relational database approach. MarkLogic Server helped the team improve productivity, automate previously manual tasks, and slash data management costs.

In the short-term, it may seem wiser to use a relational database approach, Guy Filippelli, CEO of Berico Technologies said, “But in the long-term, it’s much smarter to take the MarkLogic approach.”

Enough About Google, Future Trends in Information Access

“We’re all publishers now,” says Whit Andrews, Vice President of Enterprise Search at Gartner. Andrews opened day two of the MarkLogic User Conference with a humorous story about how he struggled to use a GPS device to find the conference hotel. His example was designed to illustrate the problems consumers have getting the right content, in the right format, at the right time, something many readers of this blog fully understand.

His session was loaded with statistics derived from Gartner research, as well as lots of notable quotes. The gist of his message: People spend too much time searching for, but not locating, relevant content. It’s a big waste of time. And, it doesn’t have to be that way.

Andrews is a component content management evangelist (as am I). He believes that to deliver relevant content — and meet the changing needs of end-users — we must be able to repurpose, recombine, remix content to produce, dynamically, the information our users need, on-demand, when they need it. New tools, techniques, standards, approaches will need to be harnessed to make this a reality. But, in the current economic climate, we should expect organizations to adopt strategies that help them efficiently leverage all of their content — text, video, audio, images, etc. — in ways that are helpful to the end-user content consumer. This approach will lead to drastically improved customer satisfaction and trust, and, for marketing professionals, far better conversion rates.

He used an interesting analogy — beads — to help the audience understand how content components can be swapped out to meet the needs of our customers. It seemed to work, as many folks in the audience were nodding their heads in agreement.

“We are at the tipping point with DITA, RSS, XML, Atom.” It’s possible to provide the right content in meaningful ways, Andrews said.

Some of the predictions were right on target, as far as I am concerned, but I’d imagine the drastic changes that are predicted are very scary for many knowledge workers. Consider this quote:

“By 2013, more than 25% of documents workers see will be dominated by non-textual content,” Andrews proclaimed.

It makes perfect sense. Consumers love video, especially for topic-based content like procedures (how-to instructions) traditionally encapsulated in technical documentation (user-guides, online help, etc.). The technology and standards are available. We’re already able to create components of video content and wrap it in DITA and deliver those components on demand as video documentation.

Andrews talked briefly about what he calls, the Hostile Information Ecosystem. The idea behind this concept is that we can’t trust anybody in the searching world — we don’t know that the content they are providing us is legit. And, because we’ve had terrible searching experiences in the past, we don’t trust that the mechanisms for calculating content relevancy are going to provide us with a good experience.

But, it doesn’t have to be this way — and, if Andrews is correct — delivering relevant componetized content is going to be norm in the near future, as well as a stratgegic advantage and competitive differentiator.

“We have to know about our customers better,” Andrews said. “Because by 2012, it’s likely that 75% of the content we interact with will be delivered to us based on what is known about us.”

It’s not enough for us to force our cutomers to rely on search alone. We need to have search help us find information we need based on what is known about us. We need search to be our friend and to know us the way our friends do.

Of course, he’s talking about dynamic personalized content, delivered, regardless of where you are, on whatever device you need it, where you need it, and how you need it. Sounds like a pipe dream, but it’s not. Some organizations are doing it already.

“Search can work perfectly when it relies on the ‘wisdom of context’,” the deep, relevant knowledge that friends know about us, Andrews said to close out his session. I believe, he’s right.

Read Whit Andrews blog.

Learn more about content component management systems and XML authoring tools.

The Impact of the Google Book Search Settlement

Since its inception, the Google Book Search initiative has caused tremendous controversy within the publishing industry. While the recent settlement between Google and the publishers may have allayed some of the key concerns, urgent questions remain on how the industry can effectively capitalize on the agreement. This closing keynote session was an interview between MarkLogic CEO Dave Kellogg and Daniel J. Clancy, PhD, engineering director for Google Book Search Project.

Clancy talked about the three types of books they included in the service, books that are in print and protected by copyright, out of print and protected by copyright, or it’s a book no longer covered by copyright.

The Google Book Search project includes 10 millions scanned books, averaging 330 pages per book. The index is about half English language books.

Interestingly, Spanish language books are accessed three times more often than English language books.

Publishers don’t always disagree with the approach Google is taking. In fact, some like it a lot because their inclusion in the Book Search project increases sales for legacy books. Google provides links to where users can purchase the books included in the index.

According to the settlement, there are a variety of ways that rights holders (folks who have copyright on books scanned and included in the index) can choose to participate. Two such options are to remain in the index and get some cut of the sales (yes, Google sells access to book content) or they can ask to be removed altogether, and Google will honor their request.

When asked by an audience member how Google is handling multiple editions of the same book, Clancy said they treat the books like a library would. So, yes, they scan and index multiple editions and versions in different languages (translations).

While book publisher rights are often the focus of discussions about digital books, an audience member asked what would happen to passing books on to used book stores?

“Certain aspects of the used book market are based on scarcity,” said Clancy. “Scarcity goes away in a digital space.”

On the subject of the perception there’s a competition beteween book archivists and Google Book Search, Clancy said, “What happens in 20 or 50  years from now when Google isn’t here?” It’s important that multiple parties are scanning and making books available 100 years from now.”

It was a very interesting conversation that provided much food for thought.

[Note: Clancy was not saying Google will not be here in 20-50 years, but that the world is a very uncertain place and you never know what the future will hold.]

To learn more about the Google Book Search class action lawsuit (and to ensure I didn’t miscommunicate the details), visit the Settlement Agreement website.

Learn more about the opportunites that result from the setlement in this white paper from MarkLogic.

Discover which libraries are helping Google scan books.

The opening session video at MarkLogic User Conference 2009 (#MLUC09) in San Francisco, May 12, 2009.

Why Publishers Should Be Investing in a Downturn

The last breakout session of the day was a panel discussion featuring five repreentatives from the publishing industry. This session was particularly interesting because recent research from IDC indicates that companies that invest during a downturn exit 30% stronger than their competitors. Given the pressures the publishing industry is already under from the declining print and advertising revenues, it was suprising to find that all five panelists indicated that their budgets were increased this year.

The panelists includes Brian Bishop, Director of e-product Development and Innovation for Springer Science and Business Media; Shannon Holmann, McGraw-Hill Higher Education; Bill Hughes, Pearson; Marilynn Jacobs, Vice President of Marketing, Quebecor World; and Maureen McMahon, President and Publisher, Kaplan Publishing.

According to the panelists, content management is a big driver for publishers seeking to develop new products and leverage legacy content, but in order for product development to take place publishers need skilled, well-trained content creators and managers with an understanding of traditional publishing and the web 2.0 and social media talents, which are in short supply.

Web technologies are also important to publishers, but not all of the panelists are spending money on implementing new web-based solutions.

On the subject of innovation, publishers say the absence of solid models and metrics are making it challenging to branch out. Additionally, institutional rigidity (fear of risk) is also hampering innovation.

The economy is playing a role in what publishers spend on technology. Most panelists said they utilize a hybrid approach to innovation, spending on new technologies while also trying to get incremental value from updating existing products. But, new projects designed to introduce innovation seem to be the clear winners.

“We got far more benefit out of the innovative spending then we did incrementally improving systems,” said Bishop.

Technological improvement decisions are being made using a more collaborative approach.Editorial, IT and business are more often finding ways to work together to solve complex publishing challenges.

“We’re getting better at the upstream conversation between the technology groups and the innovators,” Bishop said.

Publishers are now collaborating with others in the organization to bring innovation to the fore.

The last question the panelists were asked was whether they think their budgets will be higher, lower, or the same next year. The answer: A mixed bag. Some yes, some no or unsure.