Big Data

Research and Development of Big Data management and operation of satellite platform, concurrent access to large processing algorithms, data reception, management and distribution of middleware and data interfaces

Big Data management and operation of satellite research and development platform, concurrent access to large processing algorithms, data reception, management and development of middleware and data interface released its purpose is research and development of large data management and operating platform, and satellite user terminals data exchange and management between management and data exchange with the satellite gateway between the control centers, and user applications, data exchange and management between industry applications, enabling satellite data transmission, interactive, management and publishing achieve LEO satellite ground communications network system for commercial applications.

The main function modules big data management and operation of satellite terminal platform management module, and its role is receiving satellite data user terminal, data distribution, and data management. As shown in Figure 3-17, a satellite terminal management module comprising: a controller module and middleware between the satellite terminal user interaction and control, middleware interpreter module, command forwarding module, message notification module. Wherein the controller module is responsible for the middleware interact with different communication links, such as LEO satellite, GSM, GPRS, RFID and so on. Packet middleware interpreter module the received custom logic operations and converts its format into a standard format, and returned to the controller. Instruction forwarding module is responsible for receiving third-party hardware instruction sent by the user, and forwarded to the instruction according to the characteristics of the communication link corresponding middleware controller, wherein the communication process is divided into link type supports instant response and wait for notification response type depending on the characteristics of the link . Message notification module hardware for receiving the message, and the message arrives at the controller middleware provider to link the request updates received notification from the controller to the user push notifications.

Big satellite data management and research and development operations platform specific technology implementation required to complete the following:

Big satellite data management and research and development operations platform specific technology implementation required to complete the following:

  1. 1)Build Web user interaction platform, middleware and communication terminal access authentication and deployment, development of device access, middleware deployment of large data access specification, to achieve separation equipment manufacturers and operating platform;
  2. 2)Web development side of the service management platform, user management, system health monitoring, middleware testing deployment;
  3. 3)To develop a unified transfer protocol and data interface development to third-party developers, so third-party server and third-party users can easy access to relevant information about the device.
  4. 4)Big Data System security: to build a comprehensive security management of large data, including authentication, authorization, encryption, and auditing, to achieve a unified key management of large data systems, transparent compression and overall security solutions;
  5. 5)Highly abstract Big Data application development platform, big data underlying database schema, the real-time stream processing architecture big data and big data Map-Reduce programming, the generated code, compile, deploy and debug abstract into a unified application development platform;
  6. 6)Large data visualization studies, the establishment of a large number of nodes function library covering a variety of functions of data mining, including data access, data cleansing, data transformation, data statistics, data mining and data analysis algorithm library, the graphic display and the like.


Big Data management and operating platform typical traffic

By business logic points, equipped with a large global operations data service platform business includes user and device registration services, middleware, business registration, historical data access services, real-time access to business data, shown in Figure 3-18. Take the following approach to the key issues for implementation of each business:

Processing of Big Data Architecture:

Based on independent research and development of a framework Hadoop big data platform to support massive data processing, real-time streaming, unstructured data processing, large-scale data cluster. Frames large data center in Figure 3-19, the main components are a massive database HBase, Hive data warehouse, Streaming stream processing module, Searching global search module, SQL module, Zookeeper coordination and management, Oozie working engine and so on.


Frames Large Data Centers

Processing Algorithm Large Concurrent Access

In order to guarantee high concurrency systems, the first improvement in the algorithm, to eliminate rogue devices, and according to certain mechanisms in separate data terminal network, the transmission to multiple cluster server. Platform with two layers of load balancing in the design: DNS load balancing and LVS software load balancing. DNS load balancing server cluster total external access port specialized tasks by specialized hardware devices, independent of the operating system, substantial improvements in overall performance, plus a variety of load balancing strategy, intelligent traffic management, can achieve the most good load balancing needs, such as F5 and so on. LVS software load balancing cluster is to install one or more additional software to achieve load balancing on multiple server operating systems, it has the advantage based on the specific environment, the configuration is simple, flexible, low cost, to meet the general load balancing demand. Through a combination of both, to achieve high concurrency meet the requirements of high-load, and increase operational flexibility. Data processing platform has a large amount of data, data types complex, ever-changing characteristics. In order to ensure the stable operation of the platform, and from which to explore valuable data. The system will use a large data system, Hadoop-based system, the main features are:

Hybrid architecture computing and storage:Traditional data center architecture computing and storage system separated from each other, according to data stored in a central storage system, you need to calculate and then move the data on each compute node. But in the past 20 years, the performance increase storage well behind the performance increase of the CPU, memory becomes the bottleneck of system performance, thinking of the 'mobile data to individual compute nodes "are being thought of Big Data" mobile CPU to each data node "replaced. This is a large modern data systems important features: a unified computing and storage, each data node is the storage unit is calculating unit.

Using a commercial server infrastructure:A new architecture if not the cost is much higher than the traditional data plan, its competitiveness will be greatly reduced. A breakthrough Big Data platform is used as a basis for x86 commodity server nodes, especially in recent years with the x86 architecture CPU processor innovation, performance has gradually catch up with RISC server, get rid of the expensive custom hardware and vendor lock-in, since each data simultaneously on three or more server nodes commercial storage, big data systems with traditional custom hardware the same or better availability and high reliability.

Transverse Extensibility:It is also an important feature of modern large-scale data systems, companies can begin purchasing according to the storage and computing requirements from a few nodes, with the increase in business volume, along with storage and computing requirements increase gradually increase the number of nodes. This can reduce the input costs of enterprises disposable.

With large data system can solve performance problems of traditional databases, data islands issue, semi-structured and unstructured data problem, the problem of data mining, information management, storage and computing cost.

Consider open platform open and scalable platform for this project include: load balancing layer, cloud computing platforms, large data centers, the specific architecture shown in Figure 3-20.


Data Processing Architecture Big Data Platform

Load balancing layer includes a DNS-based global load balancing and load balancing LVS transport layer. The main function of this layer is the network control centers and third-party developers to communicate and requested equilibrium distribute different business modules. Access layer: hardware load balancing, LVS software load balancing, Nqinx service distribution load balancing; business logic layer using ACE architecture development: including logon authentication module, the position information processing module, equipment query module, for the needs of different modules, the definition of three species HTTP server interface.

Big Data layer: tentative non-relational database and memory database Redis MoggoDB combination of data processing: MongoDb main storage at several parts: the storage of data from the device; open storage; storage development platform and the binding relationship between the user device data platform user data; temporary storage device with third-party server direct transmission of data stored session; as a development platform, the platform provides a set of API interfaces, third-party platforms (end user) can get the data to conduct its own application system meets their business needs development.

Cloud computing platform mainly carry different business modules of the open cloud platform, including operations center, data collection module, data distribution module, the data access layer. Operations center for the management platform designed defenders, including monitoring, user management, statistical fees, deletions middleware platforms and other functions. Data collection module comprises a top layer of the authentication device ID, the device ID is identification, and the forwarding content, middleware intermediate layer, and the underlying data write processing large data center. Posted module data includes user authentication, the user ID assigned to your device management, user query the device information, as well as cloud computing platform to third party developers take the initiative to push data capabilities. Data access layer defines the main data center for large write, read interface information for data collection module, data distribution module, call center operations.

In addition, to better handle large concurrent access, the need for instant access to the process data and historical data access process research and development, as shown in FIG 3-21,3-22.


Figure 3-21 Big Data Platform To Access The Process Data In Real Time

Figure 3-22 Big Data platform Historical Data Access Process

Construction Visualization High-Performance Analytics Platform: Still can not get programmatically between big data analytics and distributed computing performance and both the chief. A direction of the project is aimed at big data analytics to create a new analytical framework, both to make data visually establish expert data analysis model, also taking advantage of distributed big data computationally efficient real-time access to valuable information. The project will create a large number of nodes libraries, will cover a variety of functions of data mining, including data access, data cleansing, data transformation, data statistics, machine learning algorithm library, the graphic display and the like. Data mining process by dragging the nodes and links manner, customers can also manufacture custom nodes according to their special needs. For real-time processing engine, including Kafka, Storm and SparkStreaming abstract a set of real-time streaming data collection, collation, sorting, processing, access to the database, enter the Web page display RESTFUL API. Big Data to develop a proprietary parallel computing compiler, data mining algorithms can be compiled into the Map-Reduce or SPARK programming language.

Integrated "big data platform," Building Global objects (equipment) associated with large data centers; use ThinkPHP framework to build MVC model, end user interaction to build Web platform, middleware and communication terminal access authentication and deployment, to develop equipment access , middleware deployment of large data access specification, to achieve separation equipment manufacturers and operating platform; by Django development Web terminal operation management platform, user management, system health monitoring, middleware testing deployment; and to third-party developers It provides a unified HTTP transport protocol and data interface, so third-party server and third-party logistics information users can get simple device. LEO Big Data platform operator interface shown in Figure 3-23.