NoSQL – Nearshore Software Development Company – IT Outsourcing Services

NoSQL vs SQL databases. What’s the difference? When to choose SQL database, and when NoSQL?

Piotr Rzeznik — Thu, 02 Mar 2023 13:18:58 +0000

Go To:

1. SQL and NoSQL database types
2. What is SQL?
3. What is a relational database?
4. How do SQL relational databases work
5. Relationship types
6. NoSQL database – how does it work?
7. Choosing between SQL and NoSQL
8. Summary

NoSQL vs SQL database types

The most popular types of databases include:

relational databases (e.g. MySQL)
and non-relational databases (e.g. MongoDB, Oracle NoSQL database).

What is SQL?

SQL is nothing but a language, but its purpose is quite different from that of Java or C # languages. SQL is used for specific activities such as data access and modification. To be more specific, SQL means Structured Query Language. It is a query language that allows you to retrieve data from a database – for this purpose it was created: to access, store and edit data in relational databases.

What is a relational database?

A relational database is a type of database that is usually made up of tables. This allows you to access data in a relationship that is part of other data (a table) in the same database. In other words, it stores data in multiple tables that are structured into columns and rows. This allows you to query data from different tables at the same time.

A relational database is based on a relational model, and RDBMS (Relational Database Management System). In order for Relational Database Management Systems to work with many kinds of databases, SQL is used for managing and creating queries, which is in this case the most popular language.

How do SQL relational databases work

Relational database is based on the relational model in which data is mapped to one or more tables (or “relationships”), columns and rows, which maps data to one or more tables (or “relationships”), columns and rows. Each row in the table has a unique identifier by which it is associated. In turn, each database table represents some type of entity (an example of an entity could be “customer”).

The rows of the table show a specific instance of this entity (e.g. customer – John Doe), and the columns, also known as attributes, show the details of a given object (e.g. name, address). The relationships themselves are nothing more than matching data in different tables on the basis of primary and foreign keys.

Elevate Your Application Development

Our tailored Application Development services meet your unique business needs. Consult with Marek Czachorowski, Head of Data and AI Solutions, for expert guidance.

Schedule a meeting

Relationship types

The main types of relationships are:

1-to-1

A one-to-one relationship between two tables. It occurs when each record from the first table has exactly one record from the second table assigned to it, and vice versa. To define a one-to-one relationship, include the primary key value from the first table in the second table.

1-to-many

A one-to-many relationship also exists between two tables. Occurs when a single record from the first table has one or more records from the second table associated with it. However, the second table only has one record from the first table associated with it.

Many-to-many

Many-to-many relationship – This is also the relationship between two tables. A single record from the first table has one or more records from the second table associated with it, and vice versa. In a many-to-many relationship, a third table is often created.

NoSQL database – how does it work?

Non-relational databases are also called NoSQL databases. The name comes from a different approach to storing and retrieving data than in SQL-based relational databases. Note that some non-relational databases support SQL.

NoSQL databases are characterized by the fact that they are able to handle a large amount of unstructured data. NoSQL solutions are nothing new, but only for several years they have been rapidly gaining popularity due to the possibility of handling many data, e.g. from mobile devices, IoT or Big Data.

Relational vs non-relational database. The key difference

The structure

SQL databases store data in tables with a fixed number of rows and columns.
NoSQL databases store data using the below data models:
document databases (JSON documents)
Key-value data model (key – value pairs)
Graph databases

Scheme / Diagram

SQL databases require a fixed, predefined schema. All data must have the same or similar structure. As a result, it is often necessary to have gathered the initial requirements for the system before starting work. In addition, the flexibility of the base may be compromised given that modifications (migrations) of the structure can be complicated and complex.
NoSQL databases have a dynamic schema for unstructured data. A fixed schema definition is not required, which makes it easier to make changes to the structure.

Scalability

SQL databases are vertically scalable (it is so-called scale-up). This means that if you want to increase the amount of data stored on a single server, you need to increase RAM, CPU performance or SSD capacity. Scaling relational databases is rather more difficult. In order to maintain data integrity in transactions in a multi-server SQL database, a backend is needed that allows to synchronize all write operations and transactions in order to avoid the deadlock phenomenon (i.e. deadlock, mutual blocking of actions).

NoSQL databases are characterized by horizontal scalability (scale-out). This means that scaling is done by increasing the number of servers. JOIN operations allow you to combine and bind pieces of data. NoSQL databases typically are not designed to handle JOIN operations efficiently, but they do. Data can reside on different servers in NoSQL databases, where joining tables from multiple servers can be troublesome. NoSQL enables easy scaling by data sharding. Having a routing layer allows you to redirect the query to the appropriate shard, making NoSQL databases highly scalable and allowing for quick query handling.

Queries

The SQL language has been around for over 30 years, which is why it is widely used, popular and has a good reputation. It is extremely efficient when it comes to querying, operating, and retrieving data from relational databases. In addition, it is also distinguished by declarativeness (that is, it allows you to describe what is to be done with its help). The advantage of SQL is that it is quite easy to learn. This means that business analysts or other employees not related to programming can use it without major problems.

When it comes to NoSQL queries, it may not be as simple to use SQL in relational databases as it usually requires additional data processing and there is no single declarative query language. Therefore, tasks using NoSQL are usually performed by programmers.

In summary, how to run queries in NoSQL databases largely depends on the database. For example, in MongoDB, to request data from a JSON document database, specify documents with properties that the results should match and use the following function: db.collection.find()
Other popular solutions may include creating query functionality directly at the application layer (rather than at the database layer) or implementing MapReduce, a platform that facilitates the processing of big data.

Choosing between SQL and NoSQL

Now that we know the main differences between SQL and NoSQL, let’s try to answer the question: When to use relational databases and when to use non-relational databases? As is often the case in IT – the decision depends
on many components. In this case, the main points to consider are:

Data types
Database management method
Amount of data

When to use SQL?

Referring to the first component, the type of data – in this case, relational databases will perform better than NoSQL databases, if data consistency and integrity are key.
It is a common belief that relational databases are not a good choice for handling large amounts of data. This statement is not entirely true. Many databases like MySQL or PostgreSQL can handle large amounts of data very well. Relational databases have a fixed, fixed schema and require data that is structured. Maintaining such a structure, consistency and efficiency may turn out to be very difficult if, with the help of the relational base, we are going to handle the Big Data business.

At first glance, it might seem that the fixed data structures may be limiting, but there is no rule here. Having a fixed, predefined structure makes SQL databases a better option for handling payment systems or reservation systems. An interesting fact is that most financial institutions rely on relational databases. Relational databases ensure transactional nature, i.e. data integrity and correctness. SQL can limit some functionality at times, but on the other hand, it is a very mature and proven technology.

When to use NoSQL?

NoSQL databases can store different kinds of data and they don’t need to be structured in any way. Therefore, non-relational databases provide greater flexibility and are a good choice for handling large amounts of data without a common structure.

Typically, the more extensive your data set is, the more likely a NoSQL database is to be the better choice. Non-relational databases are predisposed to scalability and availability, making them an ideal solution for real-time applications (e.g. online gambling, instant messaging).

How about using multiple databases?

You first need to understand the domain. What effect are you trying to achieve? Nowadays, often the choice between SQL and NoSQL is not a question of which database to use, but of when and where to use each of these databases within the same application or system.

Personally, I am working on an application in which the use of NoSQL was – without going into details – the most sensible, but the same application also required reports. To avoid excessive problems and analyzes, I decided to use both types of databases. I used NoSQL for the web and desktop application and SQL for the reports themselves. The information is stored in the NoSQL database and only the data required for the reports is transferred to the SQL database.

To sum up: relational database or non-relational data model?

Choosing the right database is not easy, even for experts, and deciding whether to choose relational or non-relational databases can depend on many factors. You should also consider how many options are available in the market for SQL and NoSQL databases. For example, if you have a large amount of unstructured data, databases may be a good solution CouchDB or MongoDB. However, when high availability is your priority, they may be a better choice Redis and Cassandra.

On the other hand, SQL databases offer many advantages in terms of data transactions and their overall integrity. Moreover, the relationships within them are easily identified and defined, which makes it easier to draw conclusions from critical insights.

Consult your project directly with a specialist

Book a meeting

MongoDB – the perfect database system for e-commerce?

Adam Sosinski — Wed, 12 Jan 2022 14:56:00 +0000

When choosing a database management system (DBMS) for your online store, you need to pay attention to a number of different aspects: flexibility, high availability, reliability, handling multiple inquiries and data timeliness. An example of a popular system addressing these needs is MongoDB, the capabilities of which I will discuss in this article.

E-commerce means more than online stores

In simple terms, e-commerce means commercial transactions conducted electronically on the Internet. By this, we mean sale-purchase transactions, as payment and delivery can be done either online or offline. Online stores are the most popular type of this trade, and for some people they equate to the concept of e-commerce itself. It is worth noting, however, that apart from e-shops, we can also distinguish auction sites, e-exchange offices, electronic banking and betting platforms.

Nearshoring services

Entrust your project to nearshore software development experts Get started now!

The challenges for your e-commerce database

In the e-commerce industry, databases undertake special tasks.

A well-configured database system should:

guarantee data availability 24/7,
maintain a high polling rate during periods of increased usage,
save large amounts of data,
provide information about changes (e.g. the availability of given products from the product catalog) dynamically and on an ongoing basis.

This is especially important during peak sales periods, such as Black Friday or Cyber Monday, that can translate into an increased number of queries. For this reason, e-commerce companies should focus on database scalability.

Which to choose: relational databases or non-relational databases?

Let’s take a closer look at data storage for e-commerce services. We can choose from several databases, the best-known of which are relational (SQL) and non-relational (NoSQL). Let’s take a look at the differences between them. To be more precise, SQL is a Structured Query Language, a language for retrieving data from a relational database. However, this type of database has been called an “SQL database”, so I will use this term for the purposes of comparison. It also makes it easier to remember the name of the second type – a NoSQL database – that is often referred to as “not SQL”.

There are 5 basic differences between them:

SQL	NoSQL
clearly defined data relationships	no relationship; the data in our database is loosely coupled
data is stored in tables	data stored in documents, graphs, as the so-called key-value
defined schema	dynamic schema, unordered data
preferred in the case of multi-line operations	preferred when the speed of data acquisition is important
vertically scalable	horizontally scalable

As you can see, the NoSQL databases perfectly match the requirements and needs of the e-commerce market in terms of data availability and storage. Currently, the most popular database system of this type is MongoDB.

What is MongoDB?

MongoDB is a document database for easy creation and scaling. Documents are created and stored in BSON (Binary JSON) format. Thanks to the use of JSON, it’s very easy to convert the queries and results into a format that can “understand” the frontend code of the e-commerce application. It is also more understandable for humans. The NoSQL solution includes hierarchy, automatic fragmentation, and built-in replication for better scalability and high availability.

Now that we have a picture of what the main challenges are in e-commerce and are sure that MongoDB is a good choice for data storage, let’s learn more about how MongoDB can support the e-commerce industry.

Advantages of NoSQL databases in e-commerce – based on the example of MongoDB

Dynamic schemas

Thanks to dynamic schemas, the documents in the collection do not have to have the same fields, and a given field can have different types depending on the document. This increases the flexibility of mapping to entities or objects. However, practice shows that the structure of documents inside the collection is similar. To guarantee this, MongoDB introduced the ability to set validation rules per collection.

Easy hierarchization of data

Thanks to the use of the JSON format, it’s easy to structure the data. You can do this by embedding one document in another or by providing references. The use of a given method should be considered individually for each collection. Embedding is recommended because it allows you to obtain data with a single query, which improves the system’s performance. References are worth considering for more complex hierarchy representations or when the benefits of embedding do not outweigh the effects of data duplication (such as the need to monitor changes when replacing data).

Streamline Your Application Maintenance

Leszek Jaros, our Head of Telco and AMS Practice, is here to help you navigate the complexities of Application Maintenance Services. Book a consultation to boost your application's efficiency

Schedule a meeting

Replication

MongoDB utilizes a concept called Replica Set, which is a set of nodes containing the same data. This enables data replication, the purpose of which is to increase availability and protect against database server failures. A properly designed architecture also allows for faster access to data.

We will discuss the key assumptions and replication mechanisms on the basis of the diagram below.

The replica set consists of one node, the so-called Primary member, and Secondary members. There is also a special member of such a set, the Arbiter, which does not contain a copy of the data but is used to select an alternative in the event that the main server is unavailable.

Saving operations are performed only on the Primary instance, from which the built-in MongoDB mechanism then copies the data to the other instances.

By default, read operations also go through the Primary instance, but it’s possible to configure the nodes so that the secondary servers are used to handle queries, which may involve the occurrence of the so-called eventual-consistency, i.e. the delayed update of data.

The clocking mechanism (heartbeat). Each of the nodes (members) polls the others every 2 seconds to check their availability. If the main server is unavailable, a new one is selected.

Deploy a Replica Set — MongoDB Manual

This process consists in selecting the one with the highest priority from the remaining instances. According to documentation, the replica can have up to 50 nodes, of which only 7 can participate in the selection process (voting); the successor is chosen from among them. Other servers, named Non-Voting members, must have the properties votes and priority set to 0. Setting an uneven number of voting instances is recommended; hence, the minimum number of nodes in a replica set is 3.

Fragmentation

Fragmentation means the process of dividing a data set into smaller pieces. In doing so, you can scale your database horizontally, practically without any limits. For fragmentation, MongoDB uses a cluster that consists of:

Shard – the replica set that contains part of the collection (chunk),
Router – it works a bit like a load balancer and, based on the configuration, forwards orders to the appropriate subcollection to balance the load,
Config server – which stores the metadata and a cluster configuration.

The relationship between the components is presented in the following diagram:

For fragmentation, it is important to choose the right key and strategy. When selecting the document field that you wish to use as the key, you need to consider:

Cardinality – how many elements we can divide the collection into in relation to the key,
Repeatability – whether any value appears more often than the others,
Consistency – whether the new key values are not increasing / decreasing linearly,
Query frequency – the key should be used in the most frequent queries.

When it comes to strategies, there are two to take advantage of:

Hashed Sharding

With this strategy, MongoDB automatically generates Hash from the key field values. It works well when the key values change consistently. Hash increases the consistent distribution of documents between shards. The disadvantage is that in the case of inquiries about a given scope, it is unlikely that all documents will be in one shard. This results in polling all parts of the collection (chunks), because the router cannot clearly determine which shard the searched documents are located in.

Ranged Sharding

Each of the shards holds parts of the collection within a given key-value range. This strategy works well when the set of values for the key is large, but each of them does not repeat often. The main advantage is that you can target your inquiry to a specific shard or collection, which significantly affects the polling speed. The built-in MongoDB mechanism serves divide into parts and to allocate them. The mechanism ensures that they are consistently distributed and tries to maintain similarity in their sizes. When deciding on fragmentation, remember that MongoDB does not have an option allowing you to merge data – you only run fragmentation again using a different key.

Streams of change

As of version 3.6, MongoDB allows you to listen for changes in a selected collection, database or the entire system, except for the admin, premises and config collections. This is done by starting the cursor, which allows you to iteratively navigate through events related to a given range. Since this mechanism uses aggregation, you can also listen for specific changes or modify received notifications. The basic requirement is to use a replica set as notification takes place at the point of saving changes in the majority of those that are responsible for data storage.

Change streams use a special, limited oplog collection to store information on operations that have an impact on the current state of the data. Documents in this collection rotate, which means that when the new document reaches the size limit of the collection, the oldest ones are deleted. Therefore, you should choose the appropriate size for this collection, depending on the frequency of events, so that you can capture the selected one before it is removed.

Conclusion

According to predictions, the dynamic development of e-commerce in Poland will continue for the next few years. Customers’ requirements for websites or applications are growing. The most important factors in improving the Customer Experience include availability, speed and reliability. A properly configured database system such as MongoDB is resistant to failures, scalable, and allows you to hierarchize and store of large amounts of data, so it fulfils the needs of any e-commerce projects.

Consult your project directly with a specialist

Book a meeting

Neo4j – an invitation to graph databases

Marcin Jawor — Tue, 26 Nov 2019 09:01:00 +0000

The popularity of graph databases is still growing. But how to choose the right database? And will graph databases work well in every project? In today’s article, I would like to take a look at the advantages of graph databases and encourage as many people as possible from the programming industry to take advantage of all the benefits they can bring to a developer’s daily work. I will pay special attention to my favorite representative of this genre, which is Neo4j.

Go To:

1. What is a graph database?
2. Database selection
3. Advantages of Neo4j
4. What does the fact that Neo4j is a NoSQL database mean?
5. Bus connection search engine – case study
6. The troublesome RDBMS
7. RDBMS full of Joins
8. RDBMS full of subqueries
9. Case study: Using a Neo4j graph
10. Summary

What is a graph database?

Neo4j (https://neo4j.com/) is one of the most popular, if not the most popular, graph database. As a reminder, as well as to organize your knowledge: a graph is a composition of two types of elements, which are nodes and relationships. A node can represent a specific type or several types and has its own properties. Relationships, on the other hand, apart from a name and their own properties, have – most importantly – a direction of interaction. The properties mentioned above are collections of key – value pairs. They are used to store important information. For example: if the node is a person, then their properties can be: given name, last name, age or list of favorite books.

Relationships between nodes in Neo4j – as in graph databases in general – are just as important as the nodes in terms of data. We treat them as objects, the existence of which is determined by the presence of given nodes. The independent existence of relationships has no justification.

Relationships known from RDBMS (Relational Database Management System)-type databases are usually associated with marking data in a row of one table as reflected in a particular row of another table. This allows for cascading operations when editing or deleting data. When normalizing a data model in RDBMS, we can come across the necessity of introducing special intermediate tables between the two tables used for binding whole groups of rows together.

Read also: Clean Architecture

Database selection

Many communities may still cling to the belief that RDBMS are ideally suited to all kinds of tasks. Decision-makers responsible for database selection may have negative attitudes toward alternative data storage methods, and Neo4j has long been considered such an alternative. How can you reasonably choose the database(s) to use in a given project? First and foremost, the choice should be determined by the tasks facing your application.

As early as at the stage of application design – at the very stage of database selection – it is worth considering several issues:

What operations will be performed on the collected data in the future?
Will the application’s task be to save and read data without any of the more complicated operations?
Or maybe the most important issue will be mutual relationships between the data, and the program should help to analyze them?

Therefore, database selection depends on the basic function of the application being designed, as well as on the nature of the data. Will it be the data of people employed in a particular organization, who co‑create its organizational structure, and will the application you are creating be used to ensure efficient document circulation between employees? Or maybe you have to develop a flight search engine? Or an application that will provide support for a transport company in terms of logistics? Or maybe you received a super-secret order from government special services to implement a system supporting the management of a network of agents and informants? I admit the last example is indeed slightly over the top, but this is mainly because the possibilities provided by graph databases are really enormous.

However, the principle of matching a database to the purpose of a given application is not always observed, and the supporters of new and often more appropriate solutions have to struggle with the skepticism of managers (“After all, no one uses these solutions”).

Advantages of Neo4j

How surprised and astonished those critics must be when they see numerous logotypes of world-famous brands or government or research institutions when visiting the Neo4j project’s home page – from medical companies to scientific and research institutes to financial, transport, telecommunication and military companies. A full, extensive list and case studies are available on the project website. This is a tidbit for all data analysis and modeling professionals as well as application architects.

As you can see, the popularity of Neo4j and graph databases is still growing. In many rankings, including my own, the Neo4j database is one of the leaders of such solutions. It is worth mentioning that the creators of Neo4j took good care of friendly organization of clustering and cloud computing, and the most common solution is Neo4j hosted on AWS – which also speaks in its favor.

It is worth remembering that Neo4j – as a tool – is strongly supported by both modern code writing tools and popular frameworks, such as Spring Framework in the “Spring Data Neo4j” project. The fact that graph databases are being developed so intensively bodes well for the future.

What does the fact that Neo4j is a NoSQL database mean?

Just as databases from the RDBMS group use SQL query language, Neo4j uses the Cypher language. In both cases they are declarative languages. While syntactically Cypher is similar to SQL in many aspects, one of the most frequently indicated differences is the use of the MATCH keyword instead of SELECT. Another one is the use – in literal terms – of relationship arrows.

Cypher is a very flexible language in terms of query building capabilities. This is well demonstrated in examples like this one, where we perform conditional matching. In SQL, we always place the condition in the WHERE clause, whereas in Cypher it can be included right at the stage of node declaration. We can create interesting and useful queries composed of stages, using the WITH clause.

Very important, and often even crucial, is the readability of queries, which we find here. Also noteworthy is the ease of writing queries for people who have an understanding of the graph and have encountered the spelling of queries in SQL before.

Bus connection search engine – case study

The example of a ticket booking application presented below is a real-life example. The problem I faced came up a few years ago, mainly due to decision-makers’ attachment to solutions considered to be proven, and the development team’s reluctance to search for new possibilities.

Project: bus ticket booking application.

End product: seat booking and ticket purchase. Due to the excessive complexity of the topic, however, let’s limit ourselves to the connection search engine only.

Operation of the application: before booking a ticket, the user should first indicate the bus stop from which they want to depart, as well as select the destination stop from the list of available stops.

Modeling: in the case of a RDBMS-compliant approach, we will need at least three tables to model this area. The first one will be a register of bus stops and the second one will be a register of tracks on which the stops are located. In the third table, we will assign particular stops to particular tracks.

Figure 1. RDBMS-compliant table and relationship chart

The troublesome RDBMS

Here we can notice a typical problem with a table containing mapped relationships between individual rows of bound tables. The rows in the relationship contain cells filled with unreadable numbers usually belonging to the indexed table primary keys in the relationship. Deciphering the origin and meaning of these numbers can sometimes require considerable effort. The more complex the table, the more effort will be needed. In our example, such a bonding table could look like this:

Table 1. Sample fragment of possible content of a ‘track_bus_stop’ table

We can find such “creations” in Many-to-Many relationships. They can be particularly burdensome when there are numerous tables, defined foreign keys and relationships in the project. They also prove problematic when there is a plethora of data in scripts, which is needed to supply test instances of databases for integration verification of the correctness of implementation. Such a situation is often crucial in terms of development. It is often necessary to work really hard to prepare new data records while maintaining the relationships with intermediate tables. The data contained in the rows must be unique or match the master data set, and it must be consistent with other “non-null” data from at least ten other tables. It may happen that recurring references start to appear in the data model, e.g. as a result of inattention. Then, without performing the operation of disabling all restrictions on the database, it is not possible to continue adding new data or even to import correct data into it.

RDBMS full of Joins

Let’s move on to the first step using the connection search engine. A passenger will search for all possible tracks assigned to a selected bus stop.

Listing 1. Query in SQL to search for the names of all tracks to which the sample “Pstrągowa” bus stop belongs

What interesting thing has just happened? At this stage, data from three separate tables has merged in the query into one and we can now choose the rows that contain the stop we are looking for. If one track is defined by an N of the stops that make it up and are connected to it by the names of the tracks, then we reject all the rows in which the assigned stop is different from the one we indicated.

Below is an example of a result table before stripping away the rows with a different bus stop name than the one given:

Table 2. Table showing a set of the most important data from the track and bus stop tables compiled together using the ‘track_bus_stop’ table

Having rejected the rows in which the bus stop name does not correspond to the one we indicated, we obtain the presented set of rows, which should then be appropriately trimmed by rejecting any repetitions.

Table 3. Set of unique track names to which the indicated bus stop belongs

RDBMS full of subqueries

Let’s imagine now that our passenger needs a list of all the bus stops which it is possible to reach by getting on the bus at the stop indicated. Below is a sample query that meets the expectations of the application user.

Listing 2. Sample query returning a list of unique bus stops that can be reached from stop X

In the above query, we have to use two subqueries for one projection operation. There are a total of six table linking operations using the JOIN command. The complexity of this query is an undoubted disadvantage. Another disadvantage is the lack of intuitive character of SQL structures when we want to get a slice of the data set of the reality we are investigating. The result of the query in our example will be a set of data contained in the table below.

Table 4. Sample result set of possible bus stops to which the user will depart from the given initial stop

Queries from both previous listings are only an introduction to other operations leading to a ticket purchase. Further queries may sometimes be even more complex – especially once we create additional tables to store information about ticket prices depending on the chosen departure and destination stop, type of track, run, night-time or e.g. periodic discounts on given sections of the journey.

Case study: Using a Neo4j graph

Let’s take a look at the analyzed data model from the perspective of graph objects. What is the significance of the fact that something can be described using a graph? As I said at the beginning, a graph is a set of nodes and their mutual directed relationships. In the context of the analyzed bus connection search engine, our sample nodes and relationships can be presented in this way using Neo4j:

Figure 2. Graph showing the bus stops as nodes, along with their mutual defined relationships

The stops are nodes, and the road that leads from one stop to another naturally determines the relationship that exists between them. The road between the stops will result in the occurrence of appropriate tracks, so we will treat the track as a property of the relationship determined by the road.

Listing 3. Query in Cypher that allows us to search for all tracks to which the indicated bus stop has been assigned

Sample result of the above query:

Table 5. Sample result of the query used to find all tracks to which the indicated bus stop belongs

At first glance, the Listing 3 query is much more intuitive than the SQL one. It is certainly also less complicated. In the above query, we try to select all LEADS TO-type relationships to other bus stops – those coming out of the one whose name corresponds to the stop indicated in the query. We then return their tracks using the RETURN keyword.

Let’s go to step two, which should be done by the user in order to get a list of possible stops that the bus will drive through. In Neo4j, we will get this list by, for example, executing a query like the one presented in Listing 4.

Listing 4. The query will return the names of all the bus stops that can be reached from the one indicated by the user.

We can interpret the above query as follows: “Since there is a relationship between the indicated bus stop and the subsequent ones in the form of a road connecting them, then return all the subsequent stops to me until the last one.”

Below is a sample result in the form of a graph and a table of names:

Figure 3. Graph presenting possible destination stops and their mutual relationships

Table 6. Set of node

As we can see in the above examples, queries in Cypher are much shorter than their equivalents in SQL, and presented in the relevant SQL listings. At the same time, they allow us to obtain identical effects. An additional advantage is that Neo4j, apart from the tabular view, offers also a view of nodes and their relationships.

Summary

If the reality slice we are working on is a collection of objects and their mutual relationships, it is very likely that we will find a graph structure there, and therefore using a graph database will make sense. In this article, I wanted to present, above all, the intuitive character and ease of constructing queries in Neo4j. I provided an example of an application where I faced the problem of a mismatch between the tool and the project. If I had my current experience and knowledge of graph databases back then, I would have tried to convince the project’s decision-makers to use Neo4j. This would have protected the client from numerous unnecessary problems.