Data Persistence

hassamadhi9594
Jun 12, 2019
4 min read

In this blog I would like to talk about data persistence.

•Information systems process data and convert them into information

•The data should persist for later use

•To maintain the status

•For logging purposes

•To further process and derive knowledge (data science)

•Data can be stored, read, updated/modified, and deleted

📷•At run time of software systems, data is stored in main memory, which is volatile •Data should be stored in non-volatile storage for persistence

Data

Data is a set of values of subjects with respect to qualitative or quantitative variables. Data and information or knowledge are often used interchangeably; however data becomes information when it is viewed in context or in post-analysis.

Database

A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques.

Database server

A database server is a server which houses a database application that provides database services to other computer programs or to computers, as defined by the client–server model.

Database management system

A database management system (DBMS) is system software for creating and managing databases. The DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data.A DBMS makes it possible for end users to create, read, update and delete data in a database.

Data persistence techniques

•Data can be stored in

•Files

•Databases

Pros & cons of file system and database

Pros: File System

Performance can be better than when you do it in a database

Saving the files and downloading them in the file system is much simpler

Migrating the data is an easy process.

It's easy to migrate it to cloud storage

Cons:

Loosely packed

Low security

Pros: Database

ACID consistency, which includes a rollback of an update that is complicated when files are stored outside the database.

Files will be in sync with the database and cannot be orphaned, which gives you the upper hand in tracking transactions.

Backups automatically include file binaries.

It's more secure than saving in a file system.

Cons of Database

You may have to convert the files to blob in order to store them in the database.

Database backups will be more hefty and heavy.

Memory is ineffective.

Data arrangement

•Un-structured

Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents

•Semi-structured CSV but XML and JSON documents are semi structured documents

•Structured

numbers, dates, and groups of words and numbers called strings

📷

Database type

•Hierarchical databases

•Network databases

•Relational databases

•Non-relational databases (NoSQL)

•Object-oriented databases

•Graph databases

•Document databases

Data warehouse and Bid data

📷

data warehouse

In computing, a data warehouse, also known as an enterprise data warehouse, is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources.

Big data

"Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional dataprocessing application software.

Both of them hold a lot of data, used for reporting, managed by an electronic storage device. So one common thought of maximum people that recent big data will replace old data warehousing very soon. But still, big data and data warehousing is not interchangeable as they used totally for a different purpose. So let us start learning Big Data and Data Warehouse in a detail in this post.

Files and DBs are external components

•They are existing outside the software system

•Software can connect to the files/DBs to perform CRUD operations on data

•File –File path, URL

•DB –connection string

Prepared statements

•The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters.

PreparedStatementpstmt= con.prepareStatement("update STUDENT set NAME = ? where ID =

?"); pstmt.setString(1, "MyName"); pstmt.setInt(2, 111); pstmt.executeUpdate();

Callable statements

•Execute stored procedures

CallableStatementcstmt= con.prepareCall("{call anyProcedure(?, ?, ?)}"); cstmt.execute();

Object-relational mapping

Object-relational mapping (ORM, O/RM, and O/R mapping tool) in computer science is a programming technique for converting data between incompatible type systems using objectoriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language.

There are different structures for holding data at runtime

•Application holds data in objects

•Database uses tables (entities)

Mismatches between relational and object models

•Granularity

•Subtypes

•Identity

•Associations

•Data navigation

ORM implementations in JAVA

•Java Beans

•JPA

Beans use POJO stands for Plain Old Java Object. It is an ordinary Java object, not bound by any special restriction other than those forced by the Java Language Specification and not requiring any class path. POJOs are used for increasing the readability and re-usability of a program.

A POJO should not:

•Extend pre-specified classes.

•Implement pre-specified interfaces.

•Contain pre-specified annotations.

Beans

•Beans are special type of Pojos. There are some restrictions on POJO to be a bean.

•All JavaBeans are POJOs but not all POJOs are JavaBeans.

•Serializable i.e. they should implement Serializable interface. Still some POJOs who don’t implement Serializable interface are called POJOs because Serializable is a marker interface and therefore not of much burden.

Java Persistence API (JPA)

•Uses

•POJO classes

•XML based mapping file (represent the DB)

•A provider (implementation of JPA)

📷

NOSQL AND HADOOP

Not Only SQL (NOSQL)

•Relational DBs are good for structured data

•For semi-structured and un-structured data, some other types of DBs can be used

•Key-value stores

•Document databases

•Wide-column stores •Graph stores

Benefits of NoSQL

•When compared to relational databases, NoSQL databases aremore scalable and provide superior performance,and their data model addresses several issues that the relational model is not designed to address:

•Large volumes of rapidly changing structured, semi-structured, and unstructured data

NoSQL DB servers

•MongoDB

•Cassandra

•Redis

•Amazon DynamoDB

•Hbase

📷

Information retrieval (IR) is the activity of obtaining information system resources relevant to an information need from a collection. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds.

•The information retrieval process should be

•Fast/performance

•Scalable

•Efficient

•Reliable/Correct

PROGRAMMING APPLICATIONS AND FRAMEWORKS

READ ALL ABOUT IT