Force.com Data Modeling: The Advantages of Denormalization – Dreamforce ’13 Session Highlights

I recently had the opportunity to deliver my presentation Force.com Data Modeling: The Advantages of Denormalization to a sold out room at Dreamforce ’13. Based on the enthusiastic response and the feedback we received following the presentation about the lack of resources available on Salesforce Platform data modeling, I wanted to post highlights from the session here to give the community a starting point when thinking about designing a Salesforce data model. The session was inspired by a post entitled Force.com Data Modeling – Normalization is Not Your Friend that we published on this blog in 2010.


Goals

What are our goals?

  1. We need to understand why data model design so important in the Salesforce world.
  2. We need to understand how the Force.com data model is much different than the relational databases that we grew up with.
  3. We have to “unlearn” what we have been taught about normalization patterns to apply what’s best for our Salesforce implementations.

Game Plan

Here’s the game plan.

We’re going to start at a high level and talk about the core differences between Force.com and traditional relational databases.

From there we’ll talk about how we need to balance requirements – which many times compete with each other – to design data models for Force.com. This is where the “normalization vs. denormalization” argument will play itself out.

We will cover design philosophies and best practices, and explore the pros and cons of these design patterns.

Database

What is normalization? Normalization is when you organize the tables and columns of a database to cut down on dependencies and data duplication. When we normalize, we try to break things down to a level where they exist only once in the database, and then we mash everything back together with junction tables and complex queries.

  1. You don’t want to have data appear in multiple places and be inconsistent (the “update” anomaly).
  2. You don’t want to create dependencies that prevent you from writing important data if you don’t have 100% complete data (the “insertion” anomaly).
  3. And, conversely, you don’t want to create dependencies that cause you to lose important data if you just want to delete a subset of the data (the “deletion” anomaly).

Denormalize for the cloud

In a highly transactional database where you have a large volume of write operations, normalization makes a lot of sense from the perspective that these write operations will be more efficient and perform better when stripped down to the bare minimum.

Now that we live in the cloud, and the way that we interact with data has changed, there is a growing movement to denormalize data models to improve performance and usability.

Why? If you think about how we interact in the cloud, we have human beings and machines (i.e. The Internet of Things) interacting with data sources across the Internet. Most of this interaction entails consuming data, not necessarily writing data.

From that perspective databases and APIs are being architected for read performance, and that means simplifying queries and flattening data structures as much as possible. How does a flatter data model improve performance? By reducing expensive joins.

This also creates a natural decoupling from rigid data modeling patterns, allowing people and apps to use the data in the most effective way for each specific application. This in turn also promotes scalability by allowing a theoretically unlimited number of consumers of the data, because dependencies between the data model layer of the architecture have been abstracted or removed completely.

So we have, at a high level, two competing philosophies. Now let’s throw Force.com into the mix.

Force.com

Force.com is a different animal completely. In the past, we owned the infrastructure and the databases. We could tune them to improve performance. We could make up for sloppy code in our applications by throwing more hardware at the problem or building indexes in our databases.

It was like we owned a house and we could tear down or build out whatever we wanted to in order to make things work.

Multitenant Architecture

Not with Force.com. We don’t own the house anymore. In the Force.com multitenant architecture, we rent an apartment.

We can do whatever we want to within the walls of this apartment, but we can’t just knock down a wall and add on. We have to work within our constraints to design the most effective solutions.

Under the Hood

Under the hood there are some important architectural items with Force.com that are important to understand when we’re designing data models.

The first is that you do not get direct access to the database. We are working with a metadata-driven database, where our data is represented as an abstraction of what is stored in the underlying relational database.

Object names, field names, types, etc. are all used to “describe” and provide context to the underlying data. If you haven’t done so already, I highly recommend reading “The Force.com Multitenant Architecture” white paper on DeveloperForce. It is a great resource that will help you understand the “Why” behind the Force.com architecture and why we need to look at it differently.

This concept of “describing” is important as we think through how to design and implement data models in Force.com.

If you think about CRM at its core, we are using Salesforce to describe our relationship with our customers. Every object that comes out of the box is used to describe something, whether it’s a company, a person, a selling cycle, a business process, or the relationships between all of these things.

SQL vs. SOQL

Another key difference between Force.com and the traditional databases that you’re accustomed to is the query language.

Do you know SQL? Do you know SOQL? Similar construct, but some big differences, right?

There are some great resources on DeveloperForce and in the Force.com community that break down SQL vs. SOQL, but the most important thing to keep in mind is that the ability to perform joins in SOQL is extremely limited.

In order to pull data from multiple objects in the same query, those objects must be explicitly related to each other.

If you start normalizing your data model in Force.com, you are going to run into serious issues with navigation, analytics, and security because normalization creates complexity. And because of SOQL and platform limitations, you’re going to have to write more and more code to piece together your data.

Because SOQL does not have the ability to join objects that are not related, if you want to perform joins you will have to perform multiple queries, cast the results to collections, most likely Maps, and write methods to iterate over the data sets and join the data manually.

Whereas if you denormalize and keep your data model as simple and flat as possible, you can keep your SOQL queries concise and avoid what amounts to writing numerous MapReduce functions in Apex.

What are some examples of Force.com denormalization? Address and phone fields are the most obvious examples. What about picklists and multivalue picklists? Do those require joins when retrieving vales? No, the values are stored as text in a record on an object.

This segues to my core philosophy for Force.com data modeling.

Entities

Entities describe something that should exist once and only once.

  • There is only one “you” in the world. “You” should only exist once in Salesforce.
  • In Force.com, entities are represented by objects and specific records within an object.

Attributes

Attributes provide individual “pieces” of context that together describe an entity as a whole.

  • “You” have a hair color. It is brown or blonde or red. Moscone West has a street address. It is 800 Howard Street.
  • In Force.com, attributes are represented by fields and types.

Relationships

Relationships provide context to describe how entities are connected, how they are ordered, and the conditions that make the relationship meaningful or unique.

  • “You” have relationships to other human beings. You have friends, coworkers, parents, grandparents, siblings, significant others. Your company has relationships with other companies. They can be suppliers, customers, partners.
  • In Force.com, relationships can be represented by both fields and objects. Relationship fields provide the direct 1:1 connection between entities, and additional fields provide the context to describe that connection. You can use junction objects to describe many to many relationships.

Balance

The key to designing effective data models is to find the balance between competing requirements and priorities.

  • Analytics. With Salesforce, you almost always have to begin with analytics in mind. How you design your data model impacts how you can organize data for reports and dashboards.
  • User Experience. Your design might be functional, but do people want to use it?
  • Clicks vs. Code. Sure, you can write code to make Force.com do just about anything short of parking your car for you, but can you meet requirements with out-of-the box functionality if you take the time to get the data model right? Think about the operations and maintenance implications before solving a problem with code.
  • Security. Are you creating an overly complex sharing model?
  • Performance. Are your response times lighting fast, or can you measure them with an hourglass? Performance also depends on the data volumes in your Salesforce org and how your company interacts with data.
  • Scalability. Will this data model grow with your business?

Force.com Data Modeling

Understanding how and when to represent entities, attributes, and relationships in Force.com is the most important concept you can master, as it underlies every decision you have to make when designing data models.

Beyond the constraints and limitations of Force.com, there really are no “hard and fast rules” to data modeling. What works for some might not work for others.

Here are some best practices that hold up in many situations.

Hierarchy

Embrace hierarchy.

  • We learned that SOQL needs objects to be explicitly related to each other in order for them to be included in the same query.
  • We know that we can traverse up to 5 levels upwards and 1 level downward in a relationship query.
  • Use formula fields to reference data from parent objects to provide additional context to child records without duplicating data.
  • Use rollup summary fields to provide context on parent records of Master-Detail relationships in lieu of creating triggers if it is an option.
  • Establishing Master-Detail relationships also enables cross-object workflows, allowing you to perform actions such as updating field values on parent records when workflow criteria is met on a child record.
  • Designing for hierarchy will enable deep analytics as you can create custom report types that give you three levels of parent-child-grandchild relationships, with the ability to traverse relationships at each level to bring in additional attributes.
  • There are performance implications if you go too crazy with relationship queries, especially in organization with large volumes of data, but for most organizations, designing with a hierarchy in mind will provide you with a solid foundation for your data model.

Know Your Siblings

Know your siblings.

  • Just because two objects are children of the same parent doesn’t mean that they have any knowledge of each other.
  • Avoid visibility issues between siblings by defining explicit relationships wherever possible.
  • Salesforce rolled out Joined Reports to counter this limitation in analytics, but if you are working in Apex, you will need to cast query results to Maps and perform iteration functions to match up data from the siblings.

Record Types

Learn to love the Record Type.

  • Rather than creating multiple objects for entities that are similar but require unique attributes, use Record Types to keep your data model flat and denormalized by segmenting entities into subtypes.
  • Record types allow you to display different page layouts based on User Profiles. You can also limit which Record Types are visible to specific Profiles.
  • Instead of creating different picklists for different entity subtypes, you can share common picklists but only include relevant values on a record type by record type basis.
  • Record types are indexed by default, giving you an ideal way to filter results and improve the performance of queries.

Avoid Clutter

Avoid clutter.

  • You get 500 custom fields per object in Enterprise Edition, and 800 in Unlimited Edition. Don’t be afraid to use them, and by all means flatten your data model as much as possible to simplify navigation and analytics, and to improve query performance.
  • But be smart about it, don’t let your data model become a mess.
  • Avoid clutter in your page layouts.
  • Remove deprecated fields as soon as possible.

Junction Objects

Learn to use Junction objects and a whole new world will open up to you.

  • Junction objects are similar in concept to junction tables in relational databases in that they allow you to establish M:M relationships between objects.
  • This is key – junction objects allow you to reference data and use it in multiple places without duplicating it. Because you are establishing relationships between objects, you have the ability to create related lists, giving you multi-dimensional views of relationships.
  • You’re probably already using junction objects today without even being aware of it. Contact roles on Accounts? You’re joining multiple contacts to accounts without duplicating any data – you don’t have to re-create contacts at multiple accounts just to establish a relationship between them. The contacts can exist once and only once in Salesforce. Granted, contact roles are limited in their functionality and may not be the best example, but it’s a common use case.
  • To create a junction object you simply need to create a new custom object that has two relationship fields on it.
  • A “pure” junction object will have two Master-Detail relationship fields, but I don’t always like to lock into Master-Detail relationships due to the restrictions on them. And sometimes creating Master-Detail relationships is not an option. One of the cool things about using two M-D fields with your junction object is that when you configure the related list on one of the joined parent objects, you can add any field from the other joined object as a column in the related list.
  • Beyond simply joining two records together, you can add attributes – custom fields – to a junction object to provide additional context around the relationship being established.
  • Again, creating additional objects may seem like we’re talking about normalization, but if you think about what we’re doing here, we are creating new entities that describe something unique – the relationship between two or more entities. Beyond the fact that junction objects are the only way to establish M:M relationships, embracing them can simultaneously allow you to simplify your data model while enabling new dimensions for navigation and analysis.

Avoid Duplication
Avoid data duplication.

  • Always reference with formula fields where possible and avoid copying data unless it is your only option.
  • If you cannot reference data because there is no explicit relationship, ask yourself if it’s feasible to create a relationship.
  • Can you avoid duplication if you were to use a junction object?
  • Avoid picklist sprawl – if you find yourself with two or more picklists that contain similar values, look into replacing them with a lookup to a related object. This somewhat flies in the face of the whole “denormalization” argument, but if you can design an object into your data model in a way that allows for referencing from multiple child objects and adding context to the relationship, you will save yourself the administrative overhead of having to update picklists on multiple objects whenever values change.

 

Where do we go from here? What are our calls to action?

  • Take a look at your current data model. What can you do right now to make it better?
  • Experiment. Do you have your own Developer Edition org? If not, go to developerforce.com right now and sign yourself up. You can’t break anything, so get in there and play around.
  • Apply the principles that we introduced here if they make sense for your situation.
  • Read more about Force.com data modeling. There are so many resources on Developer Force, and there are a number of great blogs out there that can help you.
  • Get involved in the Salesforce community. Do you participate in the Salesforce Success Community? The Salesforce community is like nothing I’ve ever been involved with, there are great people out there doing great things, and we all want to help you succeed.
Mike Topalovich Salesforce Technical Architect in Chicago
Mike Topalovich - Salesforce Certified Force.com Platform Developer I Mike Topalovich - Salesforce Certified Force.com Platform Developer II Mike Topalovich - Salesforce Certified Force.com Developer Mike Topalovich - Salesforce Certified Force.com Advanced Developer
Mike Topalovich - Salesforce Certified Mobile Solutions Architecture Designer Mike Topalovich - Salesforce Certified Force.com Platform App Builder Mike Topalovich - Salesforce Certified Administrator Mike Topalovich - Salesforce Certified Advanced Administrator
Mike Topalovich
Hi, I’m Mike. I help companies like yours do business in new ways with Salesforce.

I am a freelance Salesforce Developer, Architect, and CTO as well as a part time instructor for Salesforce University.

Connect with me today to discuss how I can become a part of your team on an ongoing retainer basis.
Mike Topalovich on EmailMike Topalovich on FlickrMike Topalovich on LinkedinMike Topalovich on RssMike Topalovich on Twitter

One Comment

Trackbacks & Pingbacks

  1. Reading Almost Everyday | cc2013ex

Leave a Reply