The new data-intensive applications such as data analysis, artificial intelligence and the Internet of Things are generating huge growth in business data. This growth is accompanied by a new set of computer architecture considerations that revolve around the concept of data severity. In this article, I will take a closer look at the severity of the data and its impact on your company's IT architecture, particularly as you prepare to deploy artificial intelligence applications and applications. intensive learning intensive data.

What is the severity of the data?

The severity of the data is a metaphor introduced into the computer lexicon by a software engineer, Dave McCrory, in a blog post published in 2010. 1 The idea is that data and applications are attracted to one another. by the other, like the attraction between objects explained by the law of gravity. In the current context of enterprise data analysis, as data sets become larger and larger, they become more and more difficult to move. So, the data stays in place. It's the gravity – and other elements attracted by the data, such as applications and processing power – that moves to the location where the data is located.

Why should companies pay attention to the seriousness of the data?

The digital transformation within businesses – including IT transformation, mobile devices and the Internet of Things – is creating huge volumes of data that are almost impossible to manage with the conventional approaches to IT. analysis. Typically, data analysis platforms and applications reside in their own hardware and software stacks, and the data they use resides in a Direct-Attached Storage (DAS) storage. Analysis platforms – such as Splunk, Hadoop and TensorFlow – like to own the data. Thus, data migration becomes a precursor to run analyzes.

As companies evolve in their data analysis practices, this approach becomes difficult to handle. When you have huge amounts of data in different enterprise storage systems, it can prove difficult, expensive, and risky to transfer them into your analysis clusters. These barriers become even more important if you want to run cloud analytics on data stored in the enterprise, or vice versa.

These new realities for a world of ever-expanding data sets underscore the need to design enterprise computer architectures in a way that reflects the reality of the severity of the data.

How to get around the gravity of the data?

A first step is to design your architecture around a scalable Network-Attached Storage (NAS) storage platform for data consolidation. This platform must support a wide range of traditional and next-generation workloads and applications that previously used different types of storage. With this platform in place, you can manage your data in one place and give applications and processing power all their power.

data severity figureDell EMC

What are the design requirements for the severity of the data?

Here are some of the high-level design requirements for data severity.

Security, data protection and resilience

An enterprise data platform must have built-in capabilities for security, data protection, and resiliency. Security includes user authorization, authentication and control / audit of access to data assets. Data protection and resiliency involve protecting the availability of data against disk, node, network, and site failures.

Security, data protection and resilience must be applied consistently to all applications and data. This uniformity is one of the advantages of keeping a single copy of the data in a consolidated system, as opposed to multiple copies of the same data spread over different systems, all of which must be secured, protected, and resilient independently.

Economic scalability

The data platform must be highly scalable. You can start with 5 TB of storage then you will soon find that you will have to upgrade to 50 TB or 100 TB. So, look for a platform that evolves seamlessly, from terabytes to petabytes.

That said, you need to choose platforms where staff and infrastructure costs make do not scale in tandem with increasing data. In other words, a 10-fold increase in data should not result in a 10-fold increase in personnel and infrastructure costs; instead, these costs should be much less than those of data growth. Storage optimization is one way to achieve this goal.

Storage optimization

The data platform should allow the optimization of performance and capacity. This requires a platform that supports storage levels. You can therefore store the most frequently used data in your faster levels and the rarely used data kept in lower cost and higher capacity levels. The movement of data within the system must occur automatically, the system deciding where the data should be stored according to business rules that can be configured by the users.

Platform support

Data Analytics and AI platforms change often, so data must be accessible across different platforms and applications, including the ones you use today and the ones you may use later. This means that your platform must support the data access interfaces most commonly used by data analysis platforms and artificial intelligence software. Examples of such interfaces are NFS, SMB, HTTP, FTP, access to OpenStack Swift objects for your cloud initiatives, and the native Hadoop Distributed File System (HDFS) for Hadoop application support.

Here are some of the key considerations to consider when designing your architecture to address data severity issues when deploying large-scale data analytics and AI solutions. To take a closer look at the features of a data lake that can help you avoid the architectural pitfalls of data severity, explore the capabilities of Dell EMC data analytics platforms and discovery platforms. AI forms, fully based on Dell EMC Isilon, the basis of a consolidated data.

[1] The register"We have heard about the seriousness of the data – we do not know exactly how to challenge it," January 2, 2018. See also Dave McCrory, "Data Severity – in the clouds," December 7, 2010.