A document database is a type of nonrelational database that is designed to store and query data as JSON-like documents. Document databases make it easier for developers to store and query data in a database by using the same document-model format they use in their application code.
example:
Try to list down your full name age and top 3 movies you have seen their actors, directors. On a piece of paper.
Name: Derese Getachew
Age: Counting
Movies:
- Title: Top Gun
- Actors: Tom Cruise
- Director: Tony Scott
- link: (https://en.wikipedia.org/wiki/Top_Gun)
- Title: Black panter
- Actros : Chadwick Boseman , Michel B.Jordan
- Director: Ryan Coogler
- link : (https://en.wikipedia.org/wiki/Black_Panther_(film))
- Actors: Tom Cruise
lets represent the same information using JSON
{
"name":"Derese Getachew",
"Age": 32,
"Movies":[
{
"title":"Top Gun",
"actors":"Tom Cruise",
"director":"Tony Scott",
"link":"https://en.wikipedia.org/wiki/Top_Gun"
},
{
"title":"Black patnther",
"actors":"Chadwick Boseman , Michel B.Jordan",
"director":"Ryan Coogler",
"link":"https://en.wikipedia.org/wiki/Black_Panther_(film)"
}
]
}
The CAP theorem states that any distributed system can satisfy only two of these three properties:
- Consistency implies that every read fetches the last write.
- Availability implies that reads and writes always succeed. In other words, each non failing node will return a response in a reasonable amount of time.
- Partition tolerance implies that the system will continue to function even when there is a data loss or system failure.
CAP theorem categorizes a system according to three categories:
- Consistency and availability.
- Consistency and partition tolerance.
- Availability and partition tolerance.
NoSQL databases are based on the BASE approach. BASE stands for:
- Basic availability: The database should be available most of the time.
- Soft state: Temporary inconsistency is allowed.
- Eventual consistency: The system will come to a consistent state after a certain period.
The structure of the data and the relationship between the data is called schema of the data.
- Schema on Read
- The data is stored in its garble state (as it arrives) and we apply the structure to its query code.
- Having data in its raw format gives us the freedom of adopting to future changes and playing with the data with out loss of information.
- Schema on Write
- Force structure as a condition before data is written to data store.
- Extensive data modeling at the beginning
the method of how a schema is designed can influence different behaviors in a datastore.
Question : What happens when you have to change a column type in relational databases ? Do you drop the table, recreate the table and load all the data again ? what if there are foreign keys also ?
- NoSQL (not Only SQL)
- Schemaless (saying that some level of schema design is inevitable)
- Document Databse
- Schemaless
- use BSON to store documents
- Rich Query Language
- Aggregation Framework (aggregation pipline , map-reduce functions)
- Indexing
- GridFS
- Replication: a process of copying an instance of a database to different database servers.
- Sharding : Method of distributing data across multiple databases
- Mongo Shell: Interactive Javascript Interface to MongoDB
MongoDBServer -> mongod
MongoDbClient -> mongo