mirror of https://github.com/proski/madwifi
Merge -dfs to trunk - r3361 2/2
Formatting git-svn-id: http://madwifi-project.org/svn/madwifi/trunk@3364 0192ed92-7a03-0410-a25b-9323aeb14dbd
This commit is contained in:
parent
9c00c5ccf3
commit
16deb792bb
File diff suppressed because it is too large
Load Diff
|
@ -1,51 +1,54 @@
|
|||
|
||||
Minstrel
|
||||
minstrel
|
||||
|
||||
Introduction
|
||||
==================================================================
|
||||
This code is called minstrel, because we have taken a wandering minstrel
|
||||
approach. Wander around the different rates, singing wherever you
|
||||
can. And then, look at the performance, and make a choice. Note that the
|
||||
wandering minstrel will always wander in directions where he/she feels he/she
|
||||
will get paid the best for his/her work.
|
||||
==============================================================================
|
||||
|
||||
The Minstrel autorate selection algorithm is an EWMA based algorithm and is
|
||||
derived from
|
||||
1)An initial rate module we released in 2005,
|
||||
This code is called minstrel, because we have taken a wandering minstrel
|
||||
approach. Wander around the different rates, singing wherever you can. And
|
||||
then, look at the performance, and make a choice. Note that the wandering
|
||||
minstrel will always wander in directions where he/she feels he/she will get
|
||||
paid the best for his/her work.
|
||||
|
||||
The minstrel autorate selection algorithm is an EWMA based algorithm and is
|
||||
derived from
|
||||
|
||||
1) An initial rate module we released in 2005,
|
||||
http://sourceforge.net/mailarchive/forum.php?forum_id=33966&max_rows=25&style=flat&viewmonth=200501&viewday=5
|
||||
|
||||
2)the "sample" module in the madwifi-ng source tree.
|
||||
2) the "sample" module in the madwifi-ng source tree.
|
||||
|
||||
The code released in 2005 had some algorithmic and implementation
|
||||
flaws (one of which was that it was based on the old madwifi codebase)
|
||||
and was shown to be unstable. Performance of the sample module is poor
|
||||
(http://www.madwifi.org/ticket/989), and we have observed similar
|
||||
issues.
|
||||
The code released in 2005 had some algorithmic and implementation flaws (one
|
||||
of which was that it was based on the old madwifi codebase) and was shown to
|
||||
be unstable. Performance of the sample module is poor
|
||||
(http://www.madwifi.org/ticket/989), and we have observed similar issues.
|
||||
|
||||
We noted:
|
||||
1)The rate chosen by sample did not alter to match changes in the radio
|
||||
|
||||
1) The rate chosen by sample did not alter to match changes in the radio
|
||||
environment.
|
||||
2)Higher throughput (between two nodes) could often be achieved by fixing the
|
||||
bitrate of both nodes to some value.
|
||||
3)After a long period of operation, "sample" appeared to be stuck in a low
|
||||
|
||||
2) Higher throughput (between two nodes) could often be achieved by fixing
|
||||
the bitrate of both nodes to some value.
|
||||
|
||||
3) After a long period of operation, "sample" appeared to be stuck in a low
|
||||
data rate, and would not move to a higher data rate.
|
||||
|
||||
We examined the code in sample, and decided the best approach was a
|
||||
rewrite based on sample and the module we released in January 2005.
|
||||
We examined the code in sample, and decided the best approach was a rewrite
|
||||
based on sample and the module we released in January 2005.
|
||||
|
||||
Theory of operation
|
||||
==================================================================
|
||||
==============================================================================
|
||||
|
||||
We defined the measure of successfulness (of packet transmission) as
|
||||
We defined the measure of successfulness (of packet transmission) as
|
||||
|
||||
mega-bits-transmitted
|
||||
Prob_success_transmission * ----------------------
|
||||
elapsed time
|
||||
Mega bits transmitted
|
||||
Prob_success_transmission * -----------------------
|
||||
elapsed time
|
||||
|
||||
This measure of successfulness will therefore adjust the transmission speed to
|
||||
This measure of successfulness will therefore adjust the transmission speed to
|
||||
get the maximum number of data bits through the radio interface. Further, it
|
||||
means that the 1mb/sec rate (which has a very high probability of successful
|
||||
transmission) will not be used in preference to the 11mb/sec rate.
|
||||
means that the 1 Mbps rate (which has a very high probability of successful
|
||||
transmission) will not be used in preference to the 11 Mbps rate.
|
||||
|
||||
We decided that the module should record the successfulness of all packets
|
||||
that are transmitted. From this data, the module has sufficient information to
|
||||
|
@ -54,43 +57,44 @@ element was required. We had to force the module to examine bit rates other
|
|||
than optimal. Consequently, some percent of the packets have to be sent at
|
||||
rates regarded as non optimal.
|
||||
|
||||
10 times a second (this frequency is alterable by changing the driver code)
|
||||
a timer fires, which evaluates the statistics table. EWMA calculations
|
||||
10 times a second (this frequency is alterable by changing the driver code) a
|
||||
timer fires, which evaluates the statistics table. EWMA calculations
|
||||
(described below) are used to process the success history of each rate. On
|
||||
completion of the calculation, a decision is made as to the rate which has the
|
||||
best throughput, second best throughput, and highest probability of success.
|
||||
best throughput, second best throughput, and highest probability of success.
|
||||
This data is used for populating the retry chain during the next 100 ms.
|
||||
|
||||
As stated above, the minstrel algorithm collects statistics from all packet
|
||||
attempts. Minstrel spends a particular percentage of frames, doing "look
|
||||
around" i.e. randomly trying other rates, to gather statistics. The
|
||||
percentage of "look around" frames, is set at boot time via the module
|
||||
parameter "ath_lookaround_rate" and defaults to 10%. The distribution of
|
||||
lookaround frames is also randomised somewhat to avoid any potential
|
||||
"strobing" of lookaround between similar nodes.
|
||||
|
||||
TCP theory tells us that each packet sent must be delivered in under 26ms. Any
|
||||
longer duration, and the TCP network layers will start to back off. A delay of
|
||||
26ms implies that there is congestion in the network, and that fewer packets
|
||||
should be injected to the device. Our conclusion was to adjust the retry chain
|
||||
of each packet so the retry chain was guaranteed to be finished in under 26ms.
|
||||
attempts. Minstrel spends a particular percentage of frames, doing "look
|
||||
around" i.e. randomly trying other rates, to gather statistics. The percentage
|
||||
of "look around" frames, is set at boot time via the module parameter
|
||||
"ath_lookaround_rate" and defaults to 10%. The distribution of lookaround
|
||||
frames is also randomised somewhat to avoid any potential "strobing" of
|
||||
lookaround between similar nodes.
|
||||
|
||||
TCP theory tells us that each packet sent must be delivered in under 26
|
||||
ms. Any longer duration, and the TCP network layers will start to back off. A
|
||||
delay of 26 ms implies that there is congestion in the network, and that fewer
|
||||
packets should be injected to the device. Our conclusion was to adjust the
|
||||
retry chain of each packet so the retry chain was guaranteed to be finished in
|
||||
under 26 ms.
|
||||
|
||||
Retry Chain
|
||||
==================================================================
|
||||
==============================================================================
|
||||
|
||||
The HAL provides a multirate retry chain - which consists of four
|
||||
segments. Each segment is an advisement to the HAL to try to send the current
|
||||
packet at some rate, with a fixed number of retry attempts. Once the packet is
|
||||
successfully transmitted, the remainder of the retry chain is
|
||||
ignored. Selection of the number of retry attempts was based on the desire to
|
||||
get the packet out in under 26ms, or fail. We provided a module parameter,
|
||||
get the packet out in under 26 ms, or fail. We provided a module parameter,
|
||||
ath_segment_size, which has units of microseconds, and specifies the maximum
|
||||
duration one segment in the retry chain can last. This module parameter has a
|
||||
default of 6000. Our view is that a segment size of between 4000 and 6000
|
||||
seems to fit most situations.
|
||||
|
||||
There is some room for movement here - if the traffic is UDP then the limit of
|
||||
26ms for the retry chain length is "meaningless". However, one may argue that
|
||||
26 ms for the retry chain length is "meaningless". However, one may argue that
|
||||
if the packet was not transmitted after some time period, it should
|
||||
fail. Further, one does expect UDP packets to fail in transmission. We leave
|
||||
it as an area for future improvement.
|
||||
|
@ -115,28 +119,28 @@ the retry chain is less than 26 ms.
|
|||
|
||||
After some discussion, we have adjusted the code so that the lowest rate is
|
||||
never used for the lookaround packet. Our view is that since this rate is used
|
||||
for management packets, this rate must be working. - Alternatively, the link
|
||||
is set up with management packets, data packets are acknowledged with
|
||||
management packets. Should the lowest rate stop working, the link is going to
|
||||
die reasonably soon.
|
||||
for management packets, this rate must be working. Alternatively, the link is
|
||||
set up with management packets, data packets are acknowledged with management
|
||||
packets. Should the lowest rate stop working, the link is going to die
|
||||
reasonably soon.
|
||||
|
||||
Analysis of information in the /proc/net/madwifi/athX/rate_info file
|
||||
showed that the system was sampling too hard at some rates. For those
|
||||
rates that never work (54mb, 500m range) there is no point in sending
|
||||
10 sample packets (<6ms time). Consequently, for the very very low
|
||||
probability rates, we sample at most twice.
|
||||
Analysis of information in the /proc/net/madwifi/athX/rate_info file showed
|
||||
that the system was sampling too hard at some rates. For those rates that
|
||||
never work (54mb, 500m range) there is no point in sending 10 sample packets
|
||||
(< 6 ms time). Consequently, for the very very low probability rates, we
|
||||
sample at most twice.
|
||||
|
||||
The retry chain above does "work", but performance is suboptimal. The
|
||||
key problem being that when the link is good, too much time is spent
|
||||
sampling the slower rates. Thus, for two nodes adjacent to each other,
|
||||
the throughput between them was several megabits/sec below using a
|
||||
fixed rate. The view was that minstrel should not sample at the slower
|
||||
rates if the link is doing well. However, if the link deteriorates,
|
||||
minstrel should immediately sample at the lower rates.
|
||||
The retry chain above does "work", but performance is suboptimal. The key
|
||||
problem being that when the link is good, too much time is spent sampling the
|
||||
slower rates. Thus, for two nodes adjacent to each other, the throughput
|
||||
between them was several Mbps below using a fixed rate. The view was that
|
||||
minstrel should not sample at the slower rates if the link is doing
|
||||
well. However, if the link deteriorates, minstrel should immediately sample at
|
||||
the lower rates.
|
||||
|
||||
Some time later, we realised that the only way to code this reliably
|
||||
was to use the retry chain as the method of determining if the slower
|
||||
rates are sampled. The retry chain was modified as:
|
||||
Some time later, we realised that the only way to code this reliably was to
|
||||
use the retry chain as the method of determining if the slower rates are
|
||||
sampled. The retry chain was modified as:
|
||||
|
||||
Try | Lookaround rate | Normal rate
|
||||
| random < best | random > best |
|
||||
|
@ -149,33 +153,31 @@ Try | Lookaround rate | Normal rate
|
|||
With this retry chain, if the randomly selected rate is slower than the
|
||||
current best throughput, the randomly selected rate is placed second in the
|
||||
chain. If the link is not good, then there will be data collected at the
|
||||
randomly selected rate. Thus, if the best throughput rate is currently 54mbs,
|
||||
the only time slower rates are sampled is when a packet fails in
|
||||
randomly selected rate. Thus, if the best throughput rate is currently 54
|
||||
Mbps, the only time slower rates are sampled is when a packet fails in
|
||||
transmission. Consequently, if the link is ideal, all packets will be sent at
|
||||
the full rate of 54mbs. Which is good.
|
||||
the full rate of 54 Mbps. Which is good.
|
||||
|
||||
EWMA
|
||||
==================================================================
|
||||
==============================================================================
|
||||
|
||||
The EWMA calculation is carried out 10 times a second, and is run for each
|
||||
rate. This calculation has a smoothing effect, so that new results have
|
||||
a reasonable (but not large) influence on the selected rate. However, with
|
||||
time, a series of new results in some particular direction will predominate.
|
||||
Given this smoothing, we can use words like inertia to describe the EWMA.
|
||||
rate. This calculation has a smoothing effect, so that new results have a
|
||||
reasonable (but not large) influence on the selected rate. However, with time,
|
||||
a series of new results in some particular direction will predominate. Given
|
||||
this smoothing, we can use words like inertia to describe the EWMA.
|
||||
|
||||
By "new results", we mean the results collected in the just completed 100ms
|
||||
By "new results", we mean the results collected in the just completed 100 ms
|
||||
interval. Old results are the EWMA scaling values from before the just
|
||||
completed 100ms interval.
|
||||
completed 100 ms interval.
|
||||
|
||||
EWMA scaling is set by the module parameter ath_ewma_level, and defaults to
|
||||
75%. A value of 0% means use only the new results, ignore the old results.
|
||||
A value of 99% means use the old results, with a tiny influence from the new
|
||||
results.
|
||||
|
||||
|
||||
75%. A value of 0% means use only the new results, ignore the old results. A
|
||||
value of 99% means use the old results, with a tiny influence from the new
|
||||
results.
|
||||
|
||||
The calculation (performed for each rate, at each timer interrupt) of the
|
||||
probability of success is:
|
||||
probability of success is:
|
||||
|
||||
Psucces_this_time_interval * (100 - ath_ewma_level) + (Pold * ath_ewma_level)
|
||||
Pnew = ------------------------------------------------------------------------------
|
||||
|
@ -192,15 +194,18 @@ calculation is carried out. The Psuccess value for this rate is not changed.
|
|||
If the new time interval is the first time interval (the module has just been
|
||||
inserted), then Pnew is calculated from above with Pold set to 0.
|
||||
|
||||
The appropriate update interval was selected on the basis of choosing a compromise
|
||||
between
|
||||
*collecting enough success/failure information to be meaningful
|
||||
*minimising the amount of cpu time spent do the updates
|
||||
*providing a means to recover quickly enough from a bad rate selection.
|
||||
The first two points are self explanatory. When there is a sudden change in the radio
|
||||
environment, an update interval of 100ms will mean that the rates marked as optimal are
|
||||
very quickly marked as poor. Consequently, the sudden change in radio environment will
|
||||
mean that minstrel will very quickly switch to a better rate.
|
||||
The appropriate update interval was selected on the basis of choosing a
|
||||
compromise between
|
||||
|
||||
* collecting enough success/failure information to be meaningful
|
||||
* minimising the amount of cpu time spent do the updates
|
||||
* providing a means to recover quickly enough from a bad rate selection.
|
||||
|
||||
The first two points are self explanatory. When there is a sudden change in
|
||||
the radio environment, an update interval of 100 ms will mean that the rates
|
||||
marked as optimal are very quickly marked as poor. Consequently, the sudden
|
||||
change in radio environment will mean that minstrel will very quickly switch
|
||||
to a better rate.
|
||||
|
||||
A sudden change in the transmission probabilities will happen when the
|
||||
node has not transmitted any data for a while, and during that time
|
||||
|
@ -209,10 +214,8 @@ of success at each rate will be quite different. The driver must adapt
|
|||
as quickly as possible, so as to not upset the higher TCP network
|
||||
layers.
|
||||
|
||||
|
||||
|
||||
Module Parameters
|
||||
====================================================
|
||||
==============================================================================
|
||||
The module has three parameters:
|
||||
|
||||
name default value purpose
|
||||
|
@ -225,8 +228,9 @@ ath_segment_size 6000 maximum duration of a retry segment (microsec
|
|||
|
||||
|
||||
Test Network
|
||||
====================================================
|
||||
We used three computers in our test network. The first two, equipped with
|
||||
==============================================================================
|
||||
|
||||
We used three computers in our test network. The first two, equipped with
|
||||
atheros cards running in adhoc mode. We used a program that sends a fixed
|
||||
number of TCP packets between computers, and reports on the data rate. The
|
||||
application reports on the data rate - at an application layer level, which is
|
||||
|
@ -238,50 +242,49 @@ used on the air. This computer was a form of "logging of the connection"
|
|||
without introducing any additional load on the first two computers.
|
||||
|
||||
It was from monitoring the results on the third computer that we started to
|
||||
get some confidence in the correctness of the code. We observed TCP
|
||||
backoffs (described above) on this box. There was much celebration when the
|
||||
throughput increased simply because the retry chain was finished in under 26
|
||||
ms.
|
||||
get some confidence in the correctness of the code. We observed TCP backoffs
|
||||
(described above) on this box. There was much celebration when the throughput
|
||||
increased simply because the retry chain was finished in under 26 ms.
|
||||
|
||||
Our view was that throughput between the two computers should be as
|
||||
close as possible (or better than) what can be achieved by setting
|
||||
both ends to fixed rates. Thus, if setting the both ends to fixed
|
||||
rates significantly increases the throughput, a reasonable conclusion
|
||||
is that a fault exists in the adaptive rate rate.
|
||||
Our view was that throughput between the two computers should be as close as
|
||||
possible (or better than) what can be achieved by setting both ends to fixed
|
||||
rates. Thus, if setting the both ends to fixed rates significantly increases
|
||||
the throughput, a reasonable conclusion is that a fault exists in the adaptive
|
||||
rate rate.
|
||||
|
||||
We recorded throughputs (with minstrel) that are within 10% of what is
|
||||
achieved with the experimentally determined optimum fixed rate.
|
||||
achieved with the experimentally determined optimum fixed rate.
|
||||
|
||||
|
||||
Notes on Timing
|
||||
====================================================
|
||||
==============================================================================
|
||||
|
||||
As noted above, Minstrel calculates the throughput for each rate. This
|
||||
calculation (using a packet of size 1200 bytes) determines the
|
||||
transmission time on the radio medium. In these calculations, we assume a
|
||||
contention window min and max value of 4 and 10 microseconds respectively.
|
||||
As noted above, minstrel calculates the throughput for each rate. This
|
||||
calculation (using a packet of size 1200 bytes) determines the transmission
|
||||
time on the radio medium. In these calculations, we assume a contention window
|
||||
min and max value of 4 and 10 microseconds respectively.
|
||||
|
||||
Further, calculation of the transmission time is required so that we can
|
||||
guarantee a packet is transmitted (or dropped) in a minimum time period.
|
||||
The transmission time is used in determining how many times a packet
|
||||
is transmitted in each segment of the retry chain.
|
||||
guarantee a packet is transmitted (or dropped) in a minimum time period. The
|
||||
transmission time is used in determining how many times a packet is
|
||||
transmitted in each segment of the retry chain.
|
||||
|
||||
Indeed, the card will supply the cwmin/cwmax values directly
|
||||
iwpriv if_name get_cwmin <0|1|2|3> <0|1>
|
||||
|
||||
We have not made direct calls to determine cwmin/cwmax - this is an area
|
||||
for future work. Indeed, the cwmin/cwmax determination code could check to
|
||||
see if the user has altered these values with the appropriate iwpriv.
|
||||
We have not made direct calls to determine cwmin/cwmax - this is an area for
|
||||
future work. Indeed, the cwmin/cwmax determination code could check to see if
|
||||
the user has altered these values with the appropriate iwpriv.
|
||||
|
||||
The contention window size does vary with traffic class. For example,
|
||||
video and voice have a contention window min of 3 and 2 microseconds
|
||||
The contention window size does vary with traffic class. For example, video
|
||||
and voice have a contention window min of 3 and 2 microseconds
|
||||
respectively. Currently, minstrel does not check traffic class.
|
||||
|
||||
Calculating the throughputs based on traffic class and bit rate and
|
||||
variable packet size will significantly complicate the code and require
|
||||
many more sample packets. More sample packets will lower the throughput
|
||||
achieved. Thus, our view is that for this release, we should take a simple
|
||||
(but reasonable) approach that works stably and gives good throughputs.
|
||||
Calculating the throughputs based on traffic class and bit rate and variable
|
||||
packet size will significantly complicate the code and require many more
|
||||
sample packets. More sample packets will lower the throughput achieved. Thus,
|
||||
our view is that for this release, we should take a simple (but reasonable)
|
||||
approach that works stably and gives good throughputs.
|
||||
|
||||
|
||||
Values of cwin/cwmax of 4 and 10 microseconds are from
|
||||
|
@ -291,7 +294,8 @@ Quality of Service (QoS) P802.11e/D12.0, November 2004.
|
|||
|
||||
|
||||
Internal variable reporting
|
||||
====================================================
|
||||
==============================================================================
|
||||
|
||||
The minstrel algorithm reports to the proc file system its internal
|
||||
statistics, which can be viewed as text. A sample output is below:
|
||||
|
||||
|
@ -325,7 +329,7 @@ used in retry chain are selected. The rates with the maximum throughput,
|
|||
second maximum throughput and maximum probability are indicated by the letters
|
||||
T, t, and P respectively.
|
||||
|
||||
The statistics gathered in the last 100ms time period are displayed in the
|
||||
The statistics gathered in the last 100 ms time period are displayed in the
|
||||
"this prob" and "this succ/attempt" columns.
|
||||
|
||||
Finally, the number of packets transmitted at each rate, since module loading
|
||||
|
@ -334,23 +338,14 @@ note that we use the words "succ" or "success" to mean packets successfully
|
|||
sent from this node to the remote node. The driver determines success by
|
||||
analysing reports from the hal. The word "attempt" or "attempts" means the
|
||||
count of packets that we transmitted. Thus, the number in the success column
|
||||
will always be lower than the number in the attempts column.
|
||||
|
||||
|
||||
When the two nodes are brought closer together, the statistics start changing,
|
||||
and you see more successful attempts at the higher rates. The ewma prob at the
|
||||
higher rates increases and then most packets are conveyed at the higher rates.
|
||||
|
||||
When the rate is not on auto, but fixed, this table is still
|
||||
available, and will report the throughput etc for the current bit
|
||||
rate. Changing the rate from auto to fixed to auto will completely
|
||||
reset this table, and the operation of the minstrel module.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
will always be lower than the number in the attempts column.
|
||||
|
||||
When the two nodes are brought closer together, the statistics starts
|
||||
changing, and you see more successful attempts at the higher rates. The ewma
|
||||
prob at the higher rates increases and then most packets are conveyed at the
|
||||
higher rates.
|
||||
|
||||
When the rate is not on auto, but fixed, this table is still available, and
|
||||
will report the throughput etc for the current bit rate. Changing the rate
|
||||
from auto to fixed to auto will completely reset this table, and the operation
|
||||
of the minstrel module.
|
||||
|
|
Loading…
Reference in New Issue