WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Commit 4612c63

Browse files
committed
Blog post for Netdata QoS Classes Monitoring
1 parent 2fc0516 commit 4612c63

File tree

2 files changed

+206
-0
lines changed

2 files changed

+206
-0
lines changed
363 KB
Loading
Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
---
2+
slug: netdata-qos-classes-monitoring
3+
title: "Netdata QoS Classes monitoring"
4+
authors: satya
5+
tags: [qos, tc, quality-of-service, classes, network-monitoring]
6+
keywords: [qos, tc, quality-of-service, classes, network-monitoring]
7+
image: ./img/stacked-netdata.png
8+
---
9+
10+
![netdata-qos-classes](./img/stacked-netdata.png)
11+
12+
Netdata monitors `tc` QoS classes for all interfaces.
13+
14+
If you also use [FireQOS](http://firehol.org/tutorial/fireqos-new-user/) it will collect interface and class names.
15+
16+
There is a [shell helper](https://raw.githubusercontent.com/netdata/netdata/master/collectors/tc.plugin/tc-qos-helper.sh.in) for this (all parsing is done by the plugin in `C` code - this shell script is just a configuration for the command to run to get `tc` output).
17+
18+
The source of the tc plugin is [here](https://raw.githubusercontent.com/netdata/netdata/master/collectors/tc.plugin/plugin_tc.c). It is somewhat complex, because a state machine was needed to keep track of all the `tc` classes, including the pseudo classes tc dynamically creates.
19+
20+
You can see a live demo [here](https://registry.my-netdata.io/spaces/registrymy-netdataio/rooms/local/overview#metrics_correlation=false&modal=&modalTab=&modalParams=&selectedIntegrationCategory=deploy.operating-systems&chartName-val=menu_tc&local--chartName-val=menu_tc).
21+
<!--truncate-->
22+
23+
## Motivation
24+
25+
One category of metrics missing in Linux monitoring, is bandwidth consumption for each open socket (inbound and outbound traffic). So, you cannot tell how much bandwidth your web server, your database server, your backup, your ssh sessions, etc are using.
26+
27+
To solve this problem, the most *adventurous* Linux monitoring tools install kernel modules to capture all traffic, analyze it and provide reports per application. A lot of work, CPU intensive and with a great degree of risk (due to the kernel modules involved which might affect the stability of the whole system). Not to mention that such solutions are probably better suited for a core linux router in your network.
28+
29+
Others use NFACCT, the netfilter accounting module which is already part of the Linux firewall. However, this would require configuring a firewall on every system you want to measure bandwidth (just FYI, I do install a firewall on every server - and I strongly advise you to do so too - but configuring accounting on all servers seems overkill when you don't really need it for billing purposes).
30+
31+
**There is however a much simpler approach**.
32+
33+
## QoS
34+
35+
One of the features the Linux kernel has, but it is rarely used, is its ability to **apply QoS on traffic**. Even most interesting is that it can apply QoS to **both inbound and outbound traffic**.
36+
37+
QoS is about 2 features:
38+
39+
1. **Classify traffic**
40+
41+
Classification is the process of organizing traffic in groups, called **classes**. Classification can evaluate every aspect of network packets, like source and destination ports, source and destination IPs, netfilter marks, etc.
42+
43+
When you classify traffic, you just assign a label to it. Of course classes have some properties themselves (like queuing mechanisms), but let's say it is that simple: **a label**. For example **I call `web server` traffic, the traffic from my server's tcp/80, tcp/443 and to my server's tcp/80, tcp/443, while I call `web surfing` all other tcp/80 and tcp/443 traffic**. You can use any combinations you like. There is no limit.
44+
45+
2. **Apply traffic shaping rules to these classes**
46+
47+
Traffic shaping is used to control how network interface bandwidth should be shared among the classes. Normally, you need to do this, when there is not enough bandwidth to satisfy all the demand, or when you want to control the supply of bandwidth to certain services. Of course classification is sufficient for monitoring traffic, but traffic shaping is also quite important, as we will explain in the next section.
48+
49+
## Why you want QoS
50+
51+
1. **Monitoring the bandwidth used by services**
52+
53+
Netdata provides wonderful real-time charts, like this one (wait to see the orange `rsync` part):
54+
55+
![qos3](https://cloud.githubusercontent.com/assets/2662304/14474189/713ede84-0104-11e6-8c9c-8dca5c2abd63.gif)
56+
57+
2. **Ensure sensitive administrative tasks will not starve for bandwidth**
58+
59+
Have you tried to ssh to a server when the network is congested? If you have, you already know it does not work very well. QoS can guarantee that services like ssh, dns, ntp, etc will always have a small supply of bandwidth. So, no matter what happens, you will be able to ssh to your server and DNS will always work.
60+
61+
3. **Ensure administrative tasks will not monopolize all the bandwidth**
62+
63+
Services like backups, file copies, database dumps, etc can easily monopolize all the available bandwidth. It is common for example a nightly backup, or a huge file transfer to negatively influence the end-user experience. QoS can fix that.
64+
65+
4. **Ensure each end-user connection will get a fair cut of the available bandwidth.**
66+
67+
Several QoS queuing disciplines in Linux do this automatically, without any configuration from you. The result is that new sockets are favored over older ones, so that users will get a snappier experience, while others are transferring large amounts of traffic.
68+
69+
5. **Protect the servers from DDoS attacks.**
70+
71+
When your system is under a DDoS attack, it will get a lot more bandwidth compared to the one it can handle and probably your applications will crash. Setting a limit on the inbound traffic using QoS, will protect your servers (throttle the requests) and depending on the size of the attack may allow your legitimate users to access the server, while the attack is taking place.
72+
73+
Using QoS together with a [SYNPROXY](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md) will provide a great degree of protection against most DDoS attacks. Actually when I wrote that article, a few folks tried to DDoS the Netdata demo site to see in real-time the SYNPROXY operation. They did not do it right, but anyway a great deal of requests reached the Netdata server. What saved Netdata was QoS. The Netdata demo server has QoS installed, so the requests were throttled and the server did not even reach the point of resource starvation. Read about it [here](https://github.com/netdata/netdata/blob/master/collectors/proc.plugin/README.md).
74+
75+
On top of all these, QoS is extremely light. You will configure it once, and this is it. It will not bother you again and it will not use any noticeable CPU resources, especially on application and database servers.
76+
77+
```
78+
- ensure administrative tasks (like ssh, dns, etc) will always have a small but guaranteed bandwidth. So, no matter what happens, I will be able to ssh to my server and DNS will work.
79+
80+
- ensure other administrative tasks will not monopolize all the available bandwidth. So, my nightly backup will not hurt my users, a developer that is copying files over the net will not get all the available bandwidth, etc.
81+
82+
- ensure each end-user connection will get a fair cut of the available bandwidth.
83+
```
84+
85+
Once **traffic classification** is applied, we can use **[netdata](https://github.com/netdata/netdata)** to visualize the bandwidth consumption per class in real-time (no configuration is needed for Netdata - it will figure it out).
86+
87+
QoS, is extremely light. You will configure it once, and this is it. It will not bother you again and it will not use any noticeable CPU resources, especially on application and database servers.
88+
89+
This is QoS from a home linux router. Check these features:
90+
91+
1. It is real-time (per second updates)
92+
2. QoS really works in Linux - check that the `background` traffic is squeezed when `surfing` needs it.
93+
94+
![test2](https://cloud.githubusercontent.com/assets/2662304/14093004/68966020-f553-11e5-98fe-ffee2086fafd.gif)
95+
96+
---
97+
98+
## QoS in Linux?
99+
100+
Of course, `tc` is probably **the most undocumented, complicated and unfriendly** command in Linux.
101+
102+
For example, do you know that for matching a simple port range in `tc`, e.g. all the high ports, from 1025 to 65535 inclusive, you have to match these:
103+
104+
```
105+
1025/0xffff
106+
1026/0xfffe
107+
1028/0xfffc
108+
1032/0xfff8
109+
1040/0xfff0
110+
1056/0xffe0
111+
1088/0xffc0
112+
1152/0xff80
113+
1280/0xff00
114+
1536/0xfe00
115+
2048/0xf800
116+
4096/0xf000
117+
8192/0xe000
118+
16384/0xc000
119+
32768/0x8000
120+
```
121+
122+
To do it the hard way, you can go through the [tc configuration steps](#qos-configuration-with-tc). An easier way is to use **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**, a tool that simplifies QoS management in Linux.
123+
124+
## Qos Configuration with FireHOL
125+
126+
The **[FireHOL](https://firehol.org/)** package already distributes **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**. Check the **[FireQOS tutorial](https://firehol.org/tutorial/fireqos-new-user/)** to learn how to write your own QoS configuration.
127+
128+
With **[FireQOS](https://firehol.org/tutorial/fireqos-new-user/)**, it is **really simple for everyone to use QoS in Linux**. Just install the package `firehol`. It should already be available for your distribution. If not, check the **[FireHOL Installation Guide](https://firehol.org/installing/)**. After that, you will have the `fireqos` command which uses a configuration like the following `/etc/firehol/fireqos.conf`, used at the Netdata demo site:
129+
130+
```sh
131+
# configure the Netdata ports
132+
server_netdata_ports="tcp/19999"
133+
134+
interface eth0 world bidirectional ethernet balanced rate 50Mbit
135+
class arp
136+
match arp
137+
138+
class icmp
139+
match icmp
140+
141+
class dns commit 1Mbit
142+
server dns
143+
client dns
144+
145+
class ntp
146+
server ntp
147+
client ntp
148+
149+
class ssh commit 2Mbit
150+
server ssh
151+
client ssh
152+
153+
class rsync commit 2Mbit max 10Mbit
154+
server rsync
155+
client rsync
156+
157+
class web_server commit 40Mbit
158+
server http
159+
server netdata
160+
161+
class client
162+
client surfing
163+
164+
class nms commit 1Mbit
165+
match input src 10.2.3.5
166+
```
167+
168+
Nothing more is needed. You just run `fireqos start` to apply this configuration, restart Netdata and you have real-time visualization of the bandwidth consumption of your applications. FireQOS is not a daemon. It will just convert the configuration to `tc` commands. It will run them and it will exit.
169+
170+
**IMPORTANT**: If you copy this configuration to apply it to your system, please adapt the speeds - experiment in non-production environments to learn the tool, before applying it on your servers.
171+
172+
And this is what you are going to get:
173+
174+
![image](https://cloud.githubusercontent.com/assets/2662304/14436322/c91d90a4-0024-11e6-9fb1-57cdef1580df.png)
175+
176+
## QoS Configuration with tc
177+
178+
First, setup the tc rules in rc.local using commands to assign different QoS markings to different classids. You can see one such example in [github issue #4563](https://github.com/netdata/netdata/issues/4563#issuecomment-455711973).
179+
180+
Then, map the classids to names by creating `/etc/iproute2/tc_cls`. For example:
181+
182+
```
183+
2:1 Standard
184+
2:8 LowPriorityData
185+
2:10 HighThroughputData
186+
2:16 OAM
187+
2:18 LowLatencyData
188+
2:24 BroadcastVideo
189+
2:26 MultimediaStreaming
190+
2:32 RealTimeInteractive
191+
2:34 MultimediaConferencing
192+
2:40 Signalling
193+
2:46 Telephony
194+
2:48 NetworkControl
195+
```
196+
197+
Add the following configuration option in `/etc/netdata.conf`:
198+
199+
```\[plugin:tc]
200+
enable show all classes and qdiscs for all interfaces = yes
201+
```
202+
203+
Finally, create `/etc/netdata/tc-qos-helper.conf` with this content:
204+
`tc_show="class"`
205+
206+
Please note, that by default Netdata will enable monitoring metrics only when they are not zero. If they are constantly zero they are ignored. Metrics that will start having values, after Netdata is started, will be detected and charts will be automatically added to the dashboard (a refresh of the dashboard is needed for them to appear though). Set `yes` for a chart instead of `auto` to enable it permanently. You can also set the `enable zero metrics` option to `yes` in the `[global]` section which enables charts with zero metrics for all internal Netdata plugins.

0 commit comments

Comments
 (0)