What are the components of Flume?

What are the components of Flume?

Apache flume example

As explained in the video, from IT we can start by asking ourselves about the set of requirements that the different business areas request from our solution. Although the needs of each organization will be very different, there are, nevertheless, a series of generic requirements that we summarize below:

However, as I tell in the video, we can take these requirements as a starting point and use them as a reference when studying the different big data distributions currently on the market.

Hive hadoop

As a reminder, installation means getting your hands a little dirty, unless the water meter is conveniently located in a basement or garage. Flume includes rubber gloves and a plastic can opener to help you, which is convenient. Once the sensor is connected to the tool (which shouldn’t take more than 5 to 10 minutes) and the bridge is connected to your Wi-Fi network via the Flume app, it’s a quick job to get everything up and running. Flume prompts you to let the water run for a minute or two to make sure it detects water flow, after which it concludes the setup and you can begin seriously monitoring your home’s water usage.

Sqoop hadoop

In the following post we are going to talk about two libraries related to the management of large volumes of data, Apache Flume and Apache Sqoop . Although these two libraries have two quite different approaches, the final idea of both is the same. The functionality of both is to serve as a data ingestion mechanism during the initial phase of data acquisition as already indicated in the previous post Phases in Big Data and its relationship with Hadoop libraries.

Read more  How do you describe a statue?

As we have seen, Flume’s architecture is based on different components that are used within a data flow as different events are generated. This flow will be managed by a Flume agent that will be a process running in the JVM. In addition, the components of this flow must be configured in a configuration file.  Flume allows to indicate properties of these components such as the type of data, the capacity or the port where they will be listening.

Once the Flume executables are installed, the next step is to create a configuration file. This can be done through a template in the conf file in the installation directory. One of the examples we can find in the official Flume guides is the following:

Flume Computing

Apache Flume is a distributed, reliable and available system for collecting, aggregating and moving large amounts of log data from different sources to a centralized repository. If we have a situation where data is not generated regularly and we want to do a massive load of it, perhaps Flume is overkill for that task.

The use of Flume is not only restricted to log collection and aggregation. Because the sources are customizable, Flume can carry a large number of events including data generated by social networks, network traffic, email messages and almost any configurable data source.

A source consumes events directed to it from an external source, for example, an application server. The external source sends events to Flume in a format that is recognizable to the targeted source. When a source receives an event, it stores it in one or more channels. The channel is a transitive storage that stores events until they are consumed by a sink. For example, the file channel is backed up by the local file system. The sink removes the event from the channel and puts it in an external repository such as HDFS or sends it to the next source of the next agent in the data flow.

Read more  How do you describe an eye exam?