An introduction to the concepts behind the Ning Content Store, Ning's highly scalable, object-oriented storage system.
Some of the most common questions from developers new to the Ning environment have to do with the Ning Store (XNS from now on). These questions fall roughly in one of two categories:
Both stem from the fact that XNS represents a subtle but fundamental shift about how we think of data and storage. While the ideas behind XNS aren't completely new (Lucene, most notably, implicitly uses a similar model for dynamic attributes, and the Network Model from the 60s has some resemblance to our object references), their expression and flexibility is.
Important note: the code included in this page, while correct, is intended to illustrate and expand on the text, rather than serve as direct examples for use in Ning Sites.
Before going into what makes XNS different, let's briefly revisit the basic concepts of relational databases as we understand them today.
Relational DBs are a good data point for three reasons:
Why go through relational theory, even if briefly? Short answer: I think we have come to take for granted certain assumptions about how data is managed, and it is useful to go back to the underlying concepts and reasons for those assumptions. Only then we can properly frame the context of a given solution, as well as how current technology may have affected them, without a reflex response like “I can't do anything without a well-defined schema!”
Databases have been in use since the early days of computing. Initially the Network and Hierarchical models were used in early mainframes. In 1970 Codd published what can be considered the seminal paper on relational database theory, “A Relational Model of Data for Large Shared Data Banks,” while working at IBM. The relational model did not focus on the software itself, but rather on the theory behind data management, creating a solid theoretical framework to understand data operations such as read, write, query, etc.
The fundamental concept of the relational model is that data and its operations can be represented as a collection of relations between sets, defining both an algebra and a calculus to manipulate data. Since the relational model is based on set theory, it can guarantee the results of operations given certain assumptions, and through the 80s and 90s it became the dominant form of thinking about data storage and retrieval. See Appendix A: A bit of relational theory for a bit more discussion on relational theory.
The theoretical framework of the relational model is represented in relational databases as follows: each database contains a set of tables, defined with a set of fields, or columns, which in turn contain rows, which are the actual instances of the data stored. One or more of the table's columns can be keys, but only one primary key, which can be a composite key, or a single column.
In A brief history of time, Stephen Hawking said that he'd been told that for every equation he included in his book he'd cut his audience in half. He still included one equation (E = mc2) even at that risk. Since it is possible that there will be an odd number of readers for this document, I quite am worried about the pain it will cause to the person that will have to remain as 0.5 of the readers. So as an alternative to requiring an even number of readers for the document, I'll rather avoid equations altogether. So if you hear the noise of a chainsaw nearby, don't worry, it's not coming for you.
Even if we avoid equations, the description in Appendix A: A bit of relational theory clearly shows how restrictive the relational model is. In the constraints lay its power. With them, you can guarantee that if an operation maintains the constraints, then your data set as a whole will remain valid. Furthermore, it will guarantee that if you apply the constraints invalid operations will be rejected.
These two features of the relational model are the reason why relational integrity can be enforced at the database level, which leads to the database itself guaranteeing that relations are valid.
The separate issue of transactional integrity arises as a need on any multithreaded, multiuser, or distributed database, where more than one process may be attempting to modify the same piece of data.
Transactions imply a locking facility (locks can be of various types, and databases can support different levels of concurrent locking). But transactions can be built on top of a non-transactional database, just as referential integrity checks can be added on top of a flat file system.
And therein lies one of the central elements of the shift that XNS represents: who is responsible for maintaing information on the data's relations, data integrity, and why.
We have come to expect that database systems will behave in a certain way, and include certain features (such as transactions and locking). But just as in many cases these features are required, in many others they are overkill, both from a performance point of view, and from the complexity they impose on the data-management code.
We have to take one more step back now, and look at the underlying reasons for storing and retrieving data. If I write a simple application that will be accessed only occassionaly, and only by me, do I need to know what “Two Phase Locking” means? Probably not. If I'm hacking away at a testing application, will I care about referential integrity like a bank does? Maybe. But when I can easily wipe out all my data and start all over, when storage becomes for all practical purposes unlimited, the lines start to blur.
Ning targets different types of developers, and many of them won't be familiar with, and should never need, advanced database concepts. So the question is how to provide basic, flexible functionality that works well and that allows overlying more complex functionality on top?
While the relational model became the dominant form of database, it is only one way in which to achieve the actual functions a database must perform: storage, retrieval, updates (including delete) and querying. These functions should conform to the ACID properties, namely:
By 'Transaction' here we mean a set of related operations that write, update, or read content from the database.
Now that we've covered the basics, we can move on to what XNS is, and how we envision it would be used.
Perhaps what is more immediately apparent about XNS is the specification of object and/or “table” properties: they don't exist. XNS in effect leaves the responsibility of maintaining the schema up to the application layer, allowing the developer to dynamically create “fields” within content elements. For example, without having defined anything in advance, we can do the following using our PHP API:
$content = XN_Content::create("SomeContent", "example content object");
$content->title = $feed->title;
$content->description = $feed->description;
$content->my->url = $url;
XNS content elements (or “content objects”) include two types of fields: pre-defined system fields (such as title, description, and createdDate) and user-defined fields. User defined fields, which exists under the “my” namespace can be created or removed dynamically. They don't have to be specified beforehand, and they are not restricted in any way.
The first direct consequence of the schema being defined dynamically at runtime by the application is that data creation and access can flow more naturally within the application code. Take for example the notion of defining an object to store a Person's name and their possible addresses. With properly normalized SQL, this would appear as follows:
PERSON(PERSON_ID, PERSON_NAME);
ADDRESS(ADDRESS_ID, PERSON_ID, STREET, CITY_ID, COUNTRY_ID);
CITY(CITY_ID, CITY_NAME);
COUNTRY(COUNTRY_ID, COUNTRY_NAME);
These two tables have to be specified in advance, along with their attributes. With standard SQL access, the PHP code would look something like this:
$name = mysql_real_escape_string('John Smith');
$sql = “INSERT INTO PERSON ( PERSON_NAME ) VALUES ( '”. $name .”')”;
mysql_query($sql) or die(mysql_error());
$sql = 'SELECT LAST_INSERT_ID() FROM PERSON';
$personid = mysql_query($sql) or die(mysql_error());
$cityname = mysql_real_escape_string('Palo Alto');
$sql = “INSERT INTO CITY ( CITY_NAME ) VALUES ( '”. $cityname .”')”;
mysql_query($sql) or die(mysql_error());
$sql = 'SELECT LAST_INSERT_ID() FROM CITY';
$cityid = mysql_query($sql) or die(mysql_error());
$countryname = mysql_real_escape_string('US');
$sql = “INSERT INTO CITY ( CITY_NAME ) VALUES ( '”. $ countryname .”')”;
mysql_query($sql) or die(mysql_error());
$sql = 'SELECT LAST_INSERT_ID() FROM COUNTRY';
$countryid = mysql_query($sql) or die(mysql_error());
$street = mysql_real_escape_string('123 Street');
$sql = “INSERT INTO ADDRESS (PERSON_ID, STREET, CITY_ID, COUNTRY_ID)
VALUES ( “ . $personid . ”, '”. $street .”', '“ . $cityid . “', '” .
$countryid . “')”;
mysql_query($sql) or die(mysql_error());
$cityname = mysql_real_escape_string('Menlo Park');
$sql = “INSERT INTO CITY ( CITY_NAME ) VALUES ( '”. $cityname .”')”;
mysql_query($sql) or die(mysql_error());
$sql = 'SELECT LAST_INSERT_ID() FROM CITY';
$cityid = mysql_query($sql) or die(mysql_error());
$street = mysql_real_escape_string('333 Street');
$sql = “INSERT INTO ADDRESS (PERSON_ID, STREET, CITY_ID, COUNTRY_ID)
VALUES ( “ . $personid . ”, '”. $street .”', '“ . $cityid . “', '” .
$countryid . “')”;
mysql_query($sql) or die(mysql_error());
The ADDRESS_ID field (as well as the other Ids) is incremented automatically by the DB if it's not included in the insert operation. If a new attribute is required (for example, ZIP) then the table definition has to be modified—the code cannot impose a new structure dynamically on the tables. The advantage, of course, is that the strict data model definition makes it easier to derive tests, verify operations, and so on. The disadvantage is that developers that can't or aren't able to make the upfront investment of time required to define the model properly cannot get beyond that point. There is a second disadvantage, which is that of restricting the ability of users to change the data model, but we'll get to that later.
So how is this represented with XNS (again, using the PHP API). There's nothing to it, the attributes are just added as needed.
$person = XN_Content::create("Person", "person");
$person->my->name = 'John Smith';
$person->my->address1 = array('123 Street', 'Palo Alto', 'US');
$person->my->address2 = array('333 Street', 'Menlo Park', 'US');
The “XN_Content::create(“Person”, “this is a person”)” calls specifies an object of type Person with the description this is a person. “Person” can be any string, in this case we're using a descriptive name, similar to that used for a relational table, but it could really be anything the developer wants to use.
We could argue that one or the other example could be made more or less verbose. The first example in particular could benefit from higher level SQL layers that simplify certain operations. But the point is not to compare the syntax advantage of XNS in PHP (which nevertheless exists) but to look at the difference a strict mapping creates.
Suppose, for example, that we now want to add a new field, 'ZIP' to the ADDRESS table. With a relational database, we'd have to add the field to the DB, then add that field to the INSERT statements. Conversely, without modifying the database we could not add the field in the code. Using XNS, we'd only need to add the value for the ZIP in the arrays. If, say, a new field had to be added for the person's information (e.g., phone number) SQL would the same work as in the previous example, whereas XNS would allow the developer to just do
$person->my->phone = '444-5555';
In addition to setting the other fields.
The previous examples show a reference to “my->” in PHP. Objects have a system namespace which includes default attributes common to all objects. The system namespace includes attributes like title, description, and createdDate, among others. Ning reserves the system namespace to be able to add attributes to it freely without collision in the future with attributes that the developers have defined. Which is where my comes in.
Simply put, my is a namespace under which developers can create any attributes they need, dynamically, during runtime. Attributes are defined by three values, a content type (which is defined by a string), a name, and a value.
The name of an attribute can be any UTF-8 string, however, for simplicity, generally attribute names will be “human-readable”. But nothing prevents an application from creating a system that defines attributes dynamically with whatever names make sense internally, such as “xakf59594.”
The value of an attribute can be a byte array of any size (the limits on this relate to the amount of in-memory data we allow on playground machines, storage for an application, and so on, that is they are artificial limits imposed by us, PHP, etc, rather than by the design).
The content type of an attribute is defined using MIME Types. The default type is text, but it can be overridden with one of the following values:
The internal storage details of each MIME type is not exposed by XNS, storage of different types and sizes is transparent to the developer.
Use of these types are made easier through pre-defined constants in the languages Ning supports. In the future, we may support checks and validation for specific types, such as XML validation for types text/xml, character set validation when a type such as “text/plain; charset=utf-8” is specified, and so on.
A note on binary and text objects
To specify binary or text information in an attribute, you can use the standard application/octetstream and text types respectively. If the text type you are inserting is of a certain subtype, such as text/html or text/xml, you can specify that as well, however, text is a valid way of specifying only the primary type.
As we have just discussed, the my namespace allows developers to define attributes dynamically, and therefore XNS doesn't have a predefined DB schema as such. The schema becomes a construct that can be enforced by applications (if they wish to do so).
For simplicity, we will take the example of a Person's information above, which properly normalized deals to a separate table for address information. The city and country are also the result of normalization, with cities and countries stored in their own tables with foreign keys into the address table.
So. Where the relational model creates something like this:

With strict correspondence between rows in one table and rows in the other (through the foreign key references).
The XNS model, though, can be better understood as an “object continuum” where objects can share certain attributes and not others:
type name street city country
Person Joe Smith 123 St. PA USA
Person Jane Smith (not present) PA USA
... ... ...
The object continuum can then be “sliced” across any of the axes represented by attributes. “Wait a minute!” I hear you say. “This newfangled 'object continuum' looks a lot like the tables we were looking at before, except it all squashed into a single row! Plus you're duplicating all the country/city/address information that the normalization helps with!”
Let's take the last comment first: XNS does not make explicit how the objects are stored. Therefore, a developer should worry about what is semantically correct, rather than worry about duplicated information. To solve the specific problem of updates (i.e., if you want to make sure that changing, say, “US” to “USA” has to be done only once instead of for every row, it is possible to create references in XNS, a topic we'll cover later. For the moment, the relational model seems to have a slight edge (because updating only one value updates them all) at the cost of much more complexity.
As far as the first point (“this doesn't look that different”)—well, for this simple case it does look similar.
But imagine that now we've also got a list of companies (Changing requirements, that never happens right?). Companies, aside from their addresses, would have a company name and industry in which they operate.
How would this look using the relational model? Like this (the lines, as before, represent the foreign key relations in the tables)

But looking at the same data with XNS, suddenly the object continuum notion becomes more visible.
| type | name | street | city | country | industry |
| Person | Joe Smith | 123 St. | Palo Alto | USA | [empty] |
| Person | Jane Smith | [empty] | Palo Alto | USA | [empty] |
| Company | XYZ Inc. | 233 Avenue Ave. | Mountain View | USA | Services |
| Company | Hal Corp. | 433 Jupiter Orbit | [empty] | USA | Software |
And here's where the XNS 'object continuum' becomes powerful. Because there are no predefined tables, and, from the developer's viewpoint, the data exists in a single, dynamic table, complex queries become much simpler. In this example, a query that would return all the names of either companies or people that are located in Palo Alto would mean an SQL statement that, at its simplest, would be as follows:
SELECT person.name, company.name FROM company, person, city WHERE city.city_name = 'Palo Alto' AND (company.city_id = city.city_id OR person.city_id = city.city_id)
Which implies a JOIN operation, and would then require separate iteration for each result field (i.e., person.name and city.name).
In XNS, however, it would be as simple as:
$query = XN_Query::create('Content')
->filter('city', '=', 'Palo Alto')
->filter( XN_Filter::any(
XN_Filter('type', 'eic', 'Person'),
XN_Filter('type', 'eic', 'Company')
))
$things = $query->execute();
foreach ($things as $thing) {
print $thing->name;
}
Which would print out all the names for both people and companies.
To the skeptic, this may still not appear to be that much of an advantage (don't smirk, you know who you are). But if you have five tables to join, instead of two, the complexity of the SQL required grows significantly, while XNS would only require an additional filter for each new type added. Anyone who's ever had to deal with complex JOINs can appreciate the advantage of this approach.
Before discussing object references, let's take a moment to look at the topology of a different kind of database: the World Wide Web.
The conceptual model of the Web is straightforward: a decentralized store of servers, which presents a single “interface” to the world through the HTTP protocol. Servers contain sets of pages, which can either private or accessible to the public Internet, and each page can reference any number of other pages through the use of hyperlinks. Hyperlinks are one-way references that do not enforce referential integrity—they can't, since the referenced page may exist outside the current server and therefore outside the control of the person creating the link. The HTTP protocol has well-defined semantics to deal with link failures, redirects, etc., through return codes.
The following diagram is an example of three websites were pages reference each other:

In the Figure, Site A contains a three public pages that reference pages of both Site B and Site C, along with two “private” pages (that request authentication) referencing each other and are also referenced by one of the public pages of the site. Site B contains five public pages that reference each other and pages in Site A and Site C. Finally, Site C contains three public pages, one that references a page in Site B, along with two private pages that reference each other but are not referenced by the public pages. Inside companies, private page of a site form what's generally referred to as Intranet.
Any HTTP client, be it a web browser or a web robot, can traverse the store defined by the Web, following links, gather data, or performing operations on it. A client can only modify pages that it owns (e.g., a person can only modify their own website, a bot can only modify pages to which it has write access), and it can only access public pages unless it has privileges to access the private pages of a certain site.
Hyperlinks may work for a certain time, then go “dark” (and possibly become live again), or be modified redirect to a different object.
This may all sound a little obvious—after all, we know how the Web works! What are we talking about here?
In most databases, the schema is stored in the database itself in some form, while it is up to the application code to determine how and when that content in displayed. Specifically, the notion of “private” content (and its various levels of privacy) and “public” content is not inherent in any database.
XNS is exactly the opposite: it does not impose a schema, yet it has a built-in notion of public and private content. Applications can create both public and private content. Private content is accessible only to the application, whereas public content can be queried, and referenced, by any application, including the one that created it. For example
$note = XN_Content::create(“Note”);
$note->title = “the note's title...”;
$note->description = “the text of the note”;
$note->save();
creates and saves a note to XNS. By default, content created in XNS is public, but making it private is as simple as adding a call in the object creation code as follows:
$note = XN_Content::create(“Note”);
$note->title = “the note's title...”;
$note->description = “the text of the note”;
$note->isPrivate = true;
$note->save();
Content references in XNS are also simple to create. Let's say you'd like to create a list of notes, but instead of storing the information in a single object you'd like to reference notes created for a separate purpose in a different part of your application.
$notelist = XN_Content::create(“NoteList”);
$notelist->my->addContent('note',$note1);
$notelist->my->addContent('note',$note2);
$notelist->my->addContent('note',$note3);
Content references in XNS work much more like hyperlinks than foreign keys do in a relational database. The creator of the reference is “responsible” for it. It is not the responsibility of the referenced object to maintain pointers back to objects that reference it. References can be created to any public content, regardless of which application owns it, but private content can only be referenced (and queried) by the owning application. If a referenced object is removed or made private, the reference will simply stop working—just like in the Web, when a linked document is removed or hidden behind a firewall or password. XNS ensures that as long as a referenced object exists and is public, then references created to it will remain valid. In a relational database, foreign keys in a table must exist in the referenced table, and constraints also make sure that a row in a table that is referenced from another table can't be removed unless/until the referencer is. XNS has no concept of keys—all attributes can be queried on, and object references are managed transparently.
If we look at the object reference map of an XNS application, we see a picture that is remarkably close to the example of web references we saw earlier:

In the diagram, the Restaurant Reviews application (which has both public content, shown in green, and private content, shown in blue) allows users to create reviews for restaurants, and it uses some information from the California Restaurants application (which also includes some private content). The Super Ratings App, encapsulates ratings only of different things, including restaurants, and thus references both (along with other ratings applications). But what's interesting here is the pointer from the California Restaurants application back into the Restaurant Reviews app. This would mean that the creator of California Restaurants realized that Restaurant Reviews was adding information to their content, and they modified their application to include some of the review information directly withinCalifornia Restaurants, which improves their own listing and gives people more incentive to use the Restaurant Reviews application. The result: users win, and both applications win as well.
Something important to point out is that even though the “pointers” between objects could be created using explicit object references, as discussed in this section, in many cases simple queries will do. The ability of applications for querying and displaying public content that matches certain attributes or values can also be understood as a kind of “soft” reference that enables more flexible and dynamic kinds of applications.
XNS Objects include a Universal ID, and developers can rely on the ID to create, for example, a URL with a parameter that includes the ID and thus lets them quickly reference an object. However, developers should not rely on the structure of IDs. XNS only guarantees that IDs will be globally unique strings within XNS, not that they will always be integers or character arrays or other types of data.
First: the XNS query mechanism, possible because of the way in which XNS “understands” data, is much closer to what people would do than SQL. This means that it's easier for developers to map from a user query (sent through a form, by clicking on a hyperlink, etc) with XNS than with SQL. Simplicity and flexibility mean much faster (and easier) development and more power to create complex behaviors in applications.
The second point is that because XNS does not require a schema, in effect applications can be much more flexible about the data they manage. Creating an application that allows end-users to define their own fields (e.g., a flexible calendaring application, ToDo lists, etc.) in addition to those defined by the developers, becomes difficult using basic relational model tools—in essence, a developer would have to create a mechanism like that provided by XNS natively. Queries and data management are simpler, and applications can be both more flexible and more powerful, without making them harder to build.
Third, object references in XNS are more akin to URLs than to table relations. While not including a schema, XNS does support the notions of private and public content, similar to public and private web pages. This increases the usefulness of the data and underlines the difference in object references between XNS and traditional stores or databases.
Finally, XNS is flexible enough that object references may not be needed: queries (even queries with dynamic as opposed to static parameters) can in many cases serve as soft references against both public and private content.
XNS provides a flexible way to define data and its properties, while leaving the semantics up to the application that uses it, be it the one that created the data or not (in the case of public content), and simplifies the creation of Ning applications to give users with new ways to use, create, and relate information.
The relational model uses at its core the notion of a domain, which is typically mapped to the abstraction of the set of possible values of a column in a database---not to a column itself. A domain is comprised basically of a name, type, and values. The name is an identifier for the set of values, chosen arbitrarily (e.g., USER_NAME). The type defines how the values are interpreted (e.g., VARCHAR(50)), and the values are all the possible elements that belong in the domain. Mapped to an actual database, the domain is not a specific column of a table but the abstract notion of that column. Relation schemas build on the notion of domains. Relation schemas are defined by a set of attributes and a name for the schema. Each attribute belongs to a domain. For example, a typical schema may be USER[USER_NAME, USER_ID, USER_EMAIL]. The degree of a relation is the number of attributes in it.
These abstractions and strict definitions allow the relational model to use mathematical operations to determine (and later, optimize) actual data manipulation processes.
Each relation schema can have instances, alternatively called relations or relation instances. A relation or relation instance is a set of tuples (also called n-tuples). Each relation can be understood as a table in the database, and each tuple as a row in the table.
So far we have described basic relations, i.e., individual tables. The next step is the definition of constraints.
Constraints can be defined for various elements in the Relational Model, the most basic being the constraint on the values of a domain (e.g., use of type integer, or floating point number). One of the crucial constraint is the uniqueness constraint.
We have previously defined that a relation is a set of tuples, and by definition all elements of a set are unique. By specifying that tuples form a set, the relational model does not impose uniqueness for every value in a tuple, only the tuple as a whole must be unique, and in practice it is clear that various attributes will have duplicate values across the relation. This leads to the notion of a key, which must be unique for the tuple. While theoretically the key can be a combination of all the attributes, this may not be enough in many cases. Keys can be natural (i.e., one attribute—such as country name—or the combination of two preexisting attributes—such as person name+phone number—known to be unique in combination) or artificial values (i.e., sequentially generated IDs) that ensure their uniqueness for every tuple. Mathematically, keys are expressed as a subset of the attributes of a relation that cannot be removed without violating the uniqueness constraint of the set of tuples present in a relation.
One of the minimal keys in the relation is arbitrarily defined as the primary key,a key whose values are used to uniquely identify tuples in the relation. A relation may have more than one key, and it must have one and only one primary key. Primary keys define the entity integrity constraint, that states that a value of the primary key can never be null (if this were possible, the tuples may not satisfy the uniqueness constraint).
When we consider multiple relations, we end up with a relational database schema. The schema contains a set of relation schemas, which can be interconnected by relations themselves, and an instance of the schema is the actual database, composed of relation instances.
Within a schema, relations are defined using foreign key constraints. An attribute in a relation is a foreign key if it shares the domain with the primary key of another relation. The referential integrity constraint of schemas defines that values in the foreign key attribute must exist as values in the primary key attribute of the referenced relation.
The database schema itself satisfies the constraints placed on a relational schema, but at a higher level. For example, the tuples formed by connecting two relations must also form a set, and therefore it must be possible to find a minimal set of attributes that form a primary key for the combined relations.
Last updated by Mike Nicholson Aug 1.
© 2008 Created by Ning Developer Admin