initial commit

This commit is contained in:
longpanda 2020-04-05 00:07:50 +08:00
parent 2090c6fa97
commit 05a1b863a6
487 changed files with 114253 additions and 0 deletions

5
VBLADE/vblade-master/.gitignore vendored Normal file
View file

@ -0,0 +1,5 @@
*.orig
cscope.*
*.rej
*~
*.o

View file

@ -0,0 +1,340 @@
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Library General
Public License instead of this License.

View file

@ -0,0 +1,36 @@
Contributions to the vblade are welcome.
In contributing, though, please stay true to the original simplicity
of the software. Many open source projects suffer from the "creeping
feature demon" phenomenon. If you think the vblade needs a great new
feature, first seriously try to think of a way to accomplish your goal
without adding to the vblade itself.
Patches should be clean (to the point and easy to read) and should do
one thing. (Avoid, for example, mixing style changes with substantive
changes.) Send multiple patches if necessary. Patches should be
generated with "diff -uprN" if possible, and should be designed to be
applied with "patch -p1".
When possible, the best way to submit a patch is by sending it to the
aoetools-discuss list. You can subscribe at the aoetools project web
page on sourceforge.net.
When you send your patch, here are some things to cover:
* What version of the vblade did you use to generate the patch?
(Hopefully it was the latest.)
* What was your motivation for creating the patch? That is, what
problem does it solve?
* What testing did you perform to ensure that your patch did not
introduce bugs and accomplished what you intended?
* If your changes affect the end-user experience, have you updated
the vblade documentation?
* Is your email client able to send a patch without changing it?
Many email clients and servers corrupt patches. Please test your
email chain by sending an applying a patch before sending your
patch to the mailing list.

155
VBLADE/vblade-master/NEWS Normal file
View file

@ -0,0 +1,155 @@
-*- change-log -*-
2018-08-25 Christoph Biedl <sourceforge.bnwi@manchmal.in-ulm.de>
Print helpful error and exit immediately for missing device
vblade-24
2017-11-19 Catalin Salgau <csalgau@users.sourceforge.net>
On FreeBSD limit used MTU to address BPF limitation
2015-06-14 Ed Cashin <ed.cashin@acm.org>
Add convenience script for creating sparse files
vblade-23
2015-02-25 Catalin Salgau <csalgau@users.sourceforge.net>
Warn about Windows problem with CHS misalignment
2014-09-13 Ed Cashin <ed.cashin@acm.org>
code cleanup: remove unused variables
2014-08-10 Ed Cashin <ed.cashin@acm.org>
vblade-22
2014-06-08 Ed Cashin <ed.cashin@acm.org>
update version for v22 release candidate 1
buffer boundary cleanups
FreeBSD BPF and MTU fixes from Catalin Salgau
offset and size options by Christoph Biedl
vblade-22rc1
2013-03-18 Ed Cashin <ecashin@coraid.com>
add big-endian support from Daniel Mealha Cabrita <dancab@gmx.net>
vblade-21
2009-08-14 Sam Hopkins <sah@coraid.com>
bugfix: aoe command error did not set Error bit in flags
add support for AoEr11
set ident serial to shelf.slot:hostname
vblade-20
2008-10-08 Ed Cashin <ecashin@coraid.com>
add Chris Webb's bpf fix for FreeBSD
add Ryan Thomas's fix to stop bufcnt being overridden
vblade-19
2008-07-14 Ed Cashin <ecashin@coraid.com>
add Chris Webb's block device options patch
add Chris Webb's socket options patch for better jumbo handling
remove obsolete contrib/o_direct.diff
vblade-18
2008-06-09 Ed Cashin <ecashin@coraid.com>
add Chris Webb's latest BPF patch to vblade, remove from contrib
update contributed AIO patch for compatibility with current sources
vblade-17
2008-05-07 Ed Cashin <ecashin@coraid.com>
add Chris Webb's AIO patch to the contributions
add Chris Webb's BPF patch to the contributions
vblade-16
2008-02-20 Ed Cashin <ecashin@coraid.com>
require the amount of data we use, not the amount ethernet requires
make sure the packet length agrees with the config query length
make sure the packet length agrees with the amount to write
remove newline embedded in fw version field of ATA dev ID response
vblade-15
2006-11-20 Sam Hopkins <sah@coraid.com>
apply contrib jumbo patch to standard distribution
add jumbo configuration app. note in README
add jumbo README reference to manpage
add mask feature; -m flag
update manpage to describe -m flag
vblade-14
2006-10-05 Sam Hopkins <sah@coraid.com>
fix confcmd memcpy bug
correct scnt return value in read/write ata response
replace O_RDONLY fallback with explicit stat. root always wins.
vblade-13
2006-10-04 Sam Hopkins <sah@coraid.com>
fix confcmd buglets
fix atacmd buglets
add atacmd handling for bad argument errors
add O_RDONLY open if O_RDWR fails
add contrib patch directory
add contrib/README
add jumbo patch to contrib
add o_direct patch to contrib
vblade-12
2006-09-21 "Adam J. Richter" <adam@yggdrasil.com>
add install target for makefile
vblade-11
2005-12-06 Ed Cashin <ecashin@coraid.com>
fix u64 configuration on FreeBSD
release vblade-10
2005-12-06 Valeriy Glushkov <valery@rocketdivision.com>
implemented config string support
added handler for ATA Check power mode command
2005-11-15 Ed Cashin <ecashin@coraid.com>
add compatibility with platforms lacking u64 (e.g., Slackware)
release vblade-9
2005-11-10 Ed Cashin <ecashin@coraid.com>
call atainit on program startup
put VBLADE_VERSION in dat.h and use it in firmware version
release vblade-7
include Stacey's patch to use p{read,write} on FreeBSD
include Stacey's patch to typedef ulong on FreeBSD
fix makefile dependencies (e.g., rebuild on new aoe.c)
fix config string length specification
include Stacey's patch to avoid compile warnings on FreeBSD
release vblade-8
2005-11-10 "Stacey D. Son" <sson@verio.net>
include FreeBSD support
2005-10-03 Ed Cashin <ecashin@coraid.com>
don't invoke vblade with dash from vbladed
2005-08-31 20:14:12 GMT Ed Cashin <ecashin@coraid.com>
ATA identify: don't juggle bytes in shorts on big endian arch
add manpage for vblade, vbladed
release vblade-6
2005-03-17 15:24:30 GMT Ed Cashin <ecashin@coraid.com>
follow up on vblade-2's off-by-one patch, making end of device usable
release vblade-5
2005-03-15 22:03:17 GMT Ed Cashin <ecashin@coraid.com>
don't rely on kernel headers for defining the aoe type 0x88a2
release vblade-4
2005-03-15 17:27:01 GMT Ed Cashin <ecashin@coraid.com>
docs: aoe-2.6-7 is the first driver to support multiple blades per mac
release vblade-3
2005-03-11 18:30:26 GMT Ed Cashin <ecashin@coraid.com>
put 64-bit configuration into config.h file
don't use uninitialized variables
broadcast config query on startup
clarify desired patch format in HACKING
add sah@coraid.com's vblade-1.ata.c.patch: fix off-by-one and ext LBA
add docs, remove daemonizing code from vblade
release vblade-2
2005-02-08 20:21:52 GMT Ed Cashin <ecashin@coraid.com>
starting documentation
add script that daemonizes vblade process, logging output
make vblade sources -Wall clean, use daemon(3)
release vblade-1

146
VBLADE/vblade-master/README Normal file
View file

@ -0,0 +1,146 @@
INTRODUCTION
------------
The vblade is a minimal ATA over Ethernet (AoE) storage target. Its
focus is simplicity, not performance or richness of features. It
exports a seekable file available over an ethernet local area network
(LAN) via the AoE data storage protocol.
The name, "vblade," is historical: It is a virtual EtherDrive (R)
blade. The first AoE target hardware sold by Coraid was in a blade
form factor, ten to a 4-rack-unit chassis.
The seekable file is typically a block device like /dev/md0 but even
regular files will work. Sparse files can be especially convenient.
When vblade exports the block storage over AoE it becomes a storage
target. Another host on the same LAN can access the storage if it has
a compatible aoe kernel driver.
BUILDING
--------
The following command should build the vblade program on a Linux-based
system:
make
For FreeBSD systems, include an extra parameter like so:
make PLATFORM=freebsd
EXAMPLES
--------
There is a "vbladed" script that daemonizes the program and sends its
output to the logger program. Make sure you have logger installed if
you would like to run vblade as a daemon with the vbladed script.
ecashin@kokone vblade$ echo 'I have logger' | logger
ecashin@kokone vblade$ tail -3 /var/log/messages
Feb 8 14:52:49 kokone -- MARK --
Feb 8 15:12:49 kokone -- MARK --
Feb 8 15:19:56 kokone logger: I have logger
Here is a short example showing how to export a block device with a
vblade. (This is a loop device backed by a sparse file, but you could
use any seekable file instead of /dev/loop7.)
ecashin@kokone vblade$ make
cc -Wall -c -o aoe.o aoe.c
cc -Wall -c -o linux.o linux.c
cc -Wall -c -o ata.o ata.c
cc -o vblade aoe.o linux.o ata.o
ecashin@kokone vblade$ su
Password:
root@kokone vblade# modprobe loop
root@kokone vblade# dd if=/dev/zero bs=1k count=1 seek=`expr 1024 \* 4096` of=bd
-file
1+0 records in
1+0 records out
1024 bytes transferred in 0.009901 seconds (103423 bytes/sec)
root@kokone vblade# losetup /dev/loop7 bd-file
root@kokone vblade# ./vblade 9 0 eth0 /dev/loop7
ioctl returned 0
4294968320 bytes
pid 16967: e9.0, 8388610 sectors
Here's how you can use the Linux aoe driver to access the storage from
another host on the LAN.
ecashin@kokone ecashin$ ssh makki
Last login: Mon Feb 7 10:25:04 2005
ecashin@makki ~$ su
Password:
root@makki ecashin# modprobe aoe
root@makki ecashin# aoe-stat
e9.0 eth1 up
root@makki ecashin# mkfs -t ext3 /dev/etherd/e9.0
mke2fs 1.35 (28-Feb-2004)
...
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 24 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
root@makki ecashin# mkdir /mnt/e9.0
root@makki ecashin# mount /dev/etherd/e9.0 /mnt/e9.0
root@makki ecashin# echo hooray > /mnt/e9.0/test.txt
root@makki ecashin# cat /mnt/e9.0/test.txt
hooray
Remember: be as careful with these devices as you would with /dev/hda!
Jumbo Frame Compatibility
-------------------------
Vblade can use jumbo frames provided your initiator is jumbo frame
capable. There is one small configuration gotcha to consider
to avoid having the vblade kernel frequently drop frames.
Vblade uses a raw socket to perform AoE. The linux kernel will
only buffer a certain amount of data for a raw socket. For 2.6
kernels, this value is managed through /proc:
root@nai aoe# grep . /proc/sys/net/core/rmem_*
/proc/sys/net/core/rmem_default:128000
/proc/sys/net/core/rmem_max:128000
rmem_max is the max amount a user process may expand the receive
buffer to -- through setsockopt(...) -- and rmem_default is, as you
might expect, the default.
The gotcha is that this amount to buffer does not relate
to the amount of user data buffered, but the amount of
real data buffered. As an example, the Intel GbE controller
must be given 16KB frames to use an MTU over 8KB.
For each received frame, the kernel must be able to buffer
16KB, even if the aoe frame is only 60 bytes in length.
The linux aoe initiator will use 16 outstanding frames when
used with vblade. A good default for ensuring frames are
not dropped is to allocate 16KB for 17 frames:
for f in /proc/sys/net/core/rmem_*; do echo $((17 * 16 * 1024)) >$f; done
Be sure to start vblade after changing the buffering defaults
as the buffer value is set when the socket is opened.
AoE Initiator Compatibility
---------------------------
The Linux aoe driver for the 2.6 kernel is compatible if you use
aoe-2.6-7 or newer. You can use older aoe drivers but you will only
be able to see one vblade per MAC address.
Contrib Patches
---------------
see contrib/README
Kvblade
-------
While vblade runs as a userland process (like "ls" or "vi"), there
is another program that runs inside the kernel. It is called
kvblade. It is alpha software.

740
VBLADE/vblade-master/aoe.c Normal file
View file

@ -0,0 +1,740 @@
// aoe.c: the ATA over Ethernet virtual EtherDrive (R) blade
#define _GNU_SOURCE
#include "config.h"
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <netinet/in.h>
#include "dat.h"
#include "fns.h"
enum {
Nmasks= 32,
Nsrr= 256,
Alen= 6,
};
uchar masks[Nmasks*Alen];
int nmasks;
uchar srr[Nsrr*Alen];
int nsrr;
char config[Nconfig];
int nconfig = 0;
int maxscnt = 2;
char *ifname;
int bufcnt = Bufcount;
#ifndef O_BINARY
#define O_BINARY 0
#endif
typedef unsigned long long u64_t;
typedef unsigned int u32_t;
#pragma pack(4)
typedef struct ventoy_img_chunk
{
u32_t img_start_sector; // sector size: 2KB
u32_t img_end_sector; // included
u64_t disk_start_sector; // in disk_sector_size
u64_t disk_end_sector; // included
}ventoy_img_chunk;
typedef struct ventoy_disk_map
{
u64_t img_start_sector;
u64_t img_end_sector;
u64_t disk_start_sector;
u64_t disk_end_sector;
}ventoy_disk_map;
#pragma pack()
static int verbose = 0;
static u64_t g_iso_file_size = 0;
static int g_img_map_num = 0;
static ventoy_disk_map *g_img_map = NULL;
static ventoy_disk_map * vtoydm_get_img_map_data(const char *img_map_file, int *plen)
{
int i;
int len;
int rc = 1;
u64_t sector_num;
FILE *fp = NULL;
ventoy_img_chunk *chunk = NULL;
ventoy_disk_map *map = NULL;
fp = fopen(img_map_file, "rb");
if (NULL == fp)
{
fprintf(stderr, "Failed to open file %s\n", img_map_file);
return NULL;
}
fseek(fp, 0, SEEK_END);
len = (int)ftell(fp);
fseek(fp, 0, SEEK_SET);
chunk = (ventoy_img_chunk *)malloc(len);
if (NULL == chunk)
{
fprintf(stderr, "Failed to malloc memory len:%d\n", len);
goto end;
}
if (fread(chunk, 1, len, fp) != len)
{
fprintf(stderr, "Failed to read file\n");
goto end;
}
if (len % sizeof(ventoy_img_chunk))
{
fprintf(stderr, "image map file size %d is not aligned with %d\n",
len, (int)sizeof(ventoy_img_chunk));
goto end;
}
map = (ventoy_disk_map *)malloc((len / sizeof(ventoy_img_chunk)) * sizeof(ventoy_disk_map));
if (NULL == map)
{
fprintf(stderr, "Failed to malloc memory\n");
goto end;
}
for (i = 0; i < len / sizeof(ventoy_img_chunk); i++)
{
sector_num = chunk[i].img_end_sector - chunk[i].img_start_sector + 1;
g_iso_file_size += sector_num * 2048;
map[i].img_start_sector = chunk[i].img_start_sector << 2;
map[i].img_end_sector = (chunk[i].img_end_sector << 2) + 3;
map[i].disk_start_sector = chunk[i].disk_start_sector;
map[i].disk_end_sector = chunk[i].disk_end_sector;
}
rc = 0;
end:
fclose(fp);
if (chunk)
{
free(chunk);
chunk = NULL;
}
*plen = len;
return map;
}
static void parse_img_chunk(const char *img_map_file)
{
int len;
g_img_map = vtoydm_get_img_map_data(img_map_file, &len);
if (g_img_map)
{
g_img_map_num = len / sizeof(ventoy_img_chunk);
}
}
static u64_t get_disk_sector(u64_t lba)
{
int i;
ventoy_disk_map *cur = g_img_map;
for (i = 0; i < g_img_map_num; i++, cur++)
{
if (lba >= cur->img_start_sector && lba <= cur->img_end_sector)
{
return (lba - cur->img_start_sector) + cur->disk_start_sector;
}
}
return 0;
}
int getsec(int fd, uchar *place, vlong lba, int nsec)
{
int i;
int count = 0;
u64_t last_sector;
u64_t sector;
count = 1;
last_sector = get_disk_sector((u64_t)lba);
for (i = 1; i < nsec; i++)
{
sector = get_disk_sector((u64_t)(lba + i));
if (sector == (last_sector + count))
{
count++;
}
else
{
lseek(fd, last_sector * 512, SEEK_SET);
read(fd, place, count * 512);
last_sector = sector;
count = 1;
}
}
lseek(fd, last_sector * 512, SEEK_SET);
read(fd, place, count * 512);
return nsec * 512;
}
// read only
int putsec(int fd, uchar *place, vlong lba, int nsec)
{
return nsec * 512;
}
void
aoead(int fd) // advertise the virtual blade
{
uchar buf[2000];
Conf *p;
int i;
p = (Conf *)buf;
memset(p, 0, sizeof *p);
memset(p->h.dst, 0xff, 6);
memmove(p->h.src, mac, 6);
p->h.type = htons(0x88a2);
p->h.flags = Resp;
p->h.maj = htons(shelf);
p->h.min = slot;
p->h.cmd = Config;
p->bufcnt = htons(bufcnt);
p->scnt = maxscnt = (getmtu(sfd, ifname) - sizeof (Ata)) / 512;
p->firmware = htons(FWV);
p->vercmd = 0x10 | Qread;
memcpy(p->data, config, nconfig);
p->len = htons(nconfig);
if (nmasks == 0)
if (putpkt(fd, buf, sizeof *p - sizeof p->data + nconfig) == -1) {
perror("putpkt aoe id");
return;
}
for (i=0; i<nmasks; i++) {
memcpy(p->h.dst, &masks[i*Alen], Alen);
if (putpkt(fd, buf, sizeof *p - sizeof p->data + nconfig) == -1)
perror("putpkt aoe id");
}
}
int
isbcast(uchar *ea)
{
uchar *b = (uchar *)"\377\377\377\377\377\377";
return memcmp(ea, b, 6) == 0;
}
long long
getlba(uchar *p)
{
vlong v;
int i;
v = 0;
for (i = 0; i < 6; i++)
v |= (vlong)(*p++) << i * 8;
return v;
}
int
aoeata(Ata *p, int pktlen) // do ATA reqeust
{
Ataregs r;
int len = 60;
int n;
r.lba = getlba(p->lba);
r.sectors = p->sectors;
r.feature = p->err;
r.cmd = p->cmd;
if (r.cmd != 0xec)
if (!rrok(p->h.src)) {
p->h.flags |= Error;
p->h.error = Res;
return len;
}
if (atacmd(&r, (uchar *)(p+1), maxscnt*512, pktlen - sizeof(*p)) < 0) {
p->h.flags |= Error;
p->h.error = BadArg;
return len;
}
if (!(p->aflag & Write))
if ((n = p->sectors)) {
n -= r.sectors;
len = sizeof (Ata) + (n*512);
}
p->sectors = r.sectors;
p->err = r.err;
p->cmd = r.status;
return len;
}
#define QCMD(x) ((x)->vercmd & 0xf)
// yes, this makes unnecessary copies.
int
confcmd(Conf *p, int payload) // process conf request
{
int len;
len = ntohs(p->len);
if (QCMD(p) != Qread)
if (len > Nconfig || len > payload)
return 0; // if you can't play nice ...
switch (QCMD(p)) {
case Qtest:
if (len != nconfig)
return 0;
// fall thru
case Qprefix:
if (len > nconfig)
return 0;
if (memcmp(config, p->data, len))
return 0;
// fall thru
case Qread:
break;
case Qset:
if (nconfig)
if (nconfig != len || memcmp(config, p->data, len)) {
p->h.flags |= Error;
p->h.error = ConfigErr;
break;
}
// fall thru
case Qfset:
nconfig = len;
memcpy(config, p->data, nconfig);
break;
default:
p->h.flags |= Error;
p->h.error = BadArg;
}
memmove(p->data, config, nconfig);
p->len = htons(nconfig);
p->bufcnt = htons(bufcnt);
p->scnt = maxscnt = (getmtu(sfd, ifname) - sizeof (Ata)) / 512;
p->firmware = htons(FWV);
p->vercmd = 0x10 | QCMD(p); // aoe v.1
return nconfig + sizeof *p - sizeof p->data;
}
static int
aoesrr(Aoesrr *sh, int len)
{
uchar *m, *e;
int n;
e = (uchar *) sh + len;
m = (uchar *) sh + Nsrrhdr;
switch (sh->rcmd) {
default:
e: sh->h.error = BadArg;
sh->h.flags |= Error;
break;
case 1: // set
if (!rrok(sh->h.src)) {
sh->h.error = Res;
sh->h.flags |= Error;
break;
}
case 2: // force set
n = sh->nmacs * 6;
if (e < m + n)
goto e;
nsrr = sh->nmacs;
memmove(srr, m, n);
case 0: // read
break;
}
sh->nmacs = nsrr;
n = nsrr * 6;
memmove(m, srr, n);
return Nsrrhdr + n;
}
static int
addmask(uchar *ea)
{
uchar *p, *e;
p = masks;
e = p + nmasks;
for (; p<e; p += 6)
if (!memcmp(p, ea, 6))
return 2;
if (nmasks >= Nmasks)
return 0;
memmove(p, ea, 6);
nmasks++;
return 1;
}
static void
rmmask(uchar *ea)
{
uchar *p, *e;
p = masks;
e = p + nmasks;
for (; p<e; p+=6)
if (!memcmp(p, ea, 6)) {
memmove(p, p+6, e-p-6);
nmasks--;
return;
}
}
static int
aoemask(Aoemask *mh, int len)
{
Mdir *md, *mdi, *mde;
int i, n;
n = 0;
md = mdi = (Mdir *) ((uchar *)mh + Nmaskhdr);
switch (mh->cmd) {
case Medit:
mde = md + mh->nmacs;
for (; md<mde; md++) {
switch (md->cmd) {
case MDdel:
rmmask(md->mac);
continue;
case MDadd:
if (addmask(md->mac))
continue;
mh->merror = MEfull;
mh->nmacs = md - mdi;
goto e;
case MDnop:
continue;
default:
mh->merror = MEbaddir;
mh->nmacs = md - mdi;
goto e;
}
}
// success. fall thru to return list
case Mread:
md = mdi;
for (i=0; i<nmasks; i++) {
md->res = md->cmd = 0;
memmove(md->mac, &masks[i*6], 6);
md++;
}
mh->merror = 0;
mh->nmacs = nmasks;
n = sizeof *md * nmasks;
break;
default:
mh->h.flags |= Error;
mh->h.error = BadArg;
}
e: return n + Nmaskhdr;
}
void
doaoe(Aoehdr *p, int n)
{
int len;
switch (p->cmd) {
case ATAcmd:
if (n < Natahdr)
return;
len = aoeata((Ata*)p, n);
break;
case Config:
if (n < Ncfghdr)
return;
len = confcmd((Conf *)p, n);
break;
case Mask:
if (n < Nmaskhdr)
return;
len = aoemask((Aoemask *)p, n);
break;
case Resrel:
if (n < Nsrrhdr)
return;
len = aoesrr((Aoesrr *)p, n);
break;
default:
p->error = BadCmd;
p->flags |= Error;
len = n;
break;
}
if (len <= 0)
return;
memmove(p->dst, p->src, 6);
memmove(p->src, mac, 6);
p->maj = htons(shelf);
p->min = slot;
p->flags |= Resp;
if (putpkt(sfd, (uchar *) p, len) == -1) {
perror("write to network");
exit(1);
}
}
void
aoe(void)
{
Aoehdr *p;
uchar *buf;
int n, sh;
long pagesz;
enum { bufsz = 1<<16, };
if ((pagesz = sysconf(_SC_PAGESIZE)) < 0) {
perror("sysconf");
exit(1);
}
if ((buf = malloc(bufsz + pagesz)) == NULL) {
perror("malloc");
exit(1);
}
n = (size_t) buf + sizeof(Ata);
if (n & (pagesz - 1))
buf += pagesz - (n & (pagesz - 1));
aoead(sfd);
for (;;) {
n = getpkt(sfd, buf, bufsz);
if (n < 0) {
perror("read network");
exit(1);
}
if (n < sizeof(Aoehdr))
continue;
p = (Aoehdr *) buf;
if (ntohs(p->type) != 0x88a2)
continue;
if (p->flags & Resp)
continue;
sh = ntohs(p->maj);
if (sh != shelf && sh != (ushort)~0)
continue;
if (p->min != slot && p->min != (uchar)~0)
continue;
if (nmasks && !maskok(p->src))
continue;
doaoe(p, n);
}
}
void
usage(void)
{
fprintf(stderr, "usage: %s [-b bufcnt] [-o offset] [-l length] [-d ] [-s] [-r] [ -m mac[,mac...] ] shelf slot netif filename\n",
progname);
exit(1);
}
/* parseether from plan 9 */
int
parseether(uchar *to, char *from)
{
char nip[4];
char *p;
int i;
p = from;
for(i = 0; i < 6; i++){
if(*p == 0)
return -1;
nip[0] = *p++;
if(*p == 0)
return -1;
nip[1] = *p++;
nip[2] = 0;
to[i] = strtoul(nip, 0, 16);
if(*p == ':')
p++;
}
return 0;
}
void
setmask(char *ml)
{
char *p;
int n;
for (; ml; ml=p) {
p = strchr(ml, ',');
if (p)
*p++ = '\0';
n = parseether(&masks[nmasks*Alen], ml);
if (n < 0)
fprintf(stderr, "ignoring mask %s, parseether failure\n", ml);
else
nmasks++;
}
}
int
maskok(uchar *ea)
{
int i, ok = 0;
for (i=0; !ok && i<nmasks; i++)
ok = memcmp(ea, &masks[i*Alen], Alen) == 0;
return ok;
}
int
rrok(uchar *ea)
{
int i, ok = 0;
if (nsrr == 0)
return 1;
for (i=0; !ok && i<nsrr; i++)
ok = memcmp(ea, &srr[i*Alen], Alen) == 0;
return ok;
}
void
setserial(int sh, int sl)
{
char h[32];
h[0] = 0;
gethostname(h, sizeof h);
snprintf(serial, Nserial, "%d.%d:%.*s", sh, sl, (int) sizeof h, h);
}
int
main(int argc, char **argv)
{
int ch, omode = 0, readonly = 0;
vlong length = 0;
char *end;
char filepath[300] = {0};
/* Avoid to be killed by systemd */
if (access("/etc/initrd-release", F_OK) >= 0)
{
argv[0][0] = '@';
}
bufcnt = Bufcount;
offset = 0;
setbuf(stdin, NULL);
progname = *argv;
while ((ch = getopt(argc, argv, "b:dsrm:f:tv::o:l:")) != -1) {
switch (ch) {
case 'b':
bufcnt = atoi(optarg);
break;
case 'd':
#ifdef O_DIRECT
omode |= O_DIRECT;
#endif
break;
case 's':
omode |= O_SYNC;
break;
case 'r':
readonly = 1;
break;
case 'm':
setmask(optarg);
break;
case 't':
return 0;
case 'v':
verbose = 1;
break;
case 'f':
strncpy(filepath, optarg, sizeof(filepath) - 1);
break;
case 'o':
offset = strtoll(optarg, &end, 0);
if (end == optarg || offset < 0)
usage();
break;
case 'l':
length = strtoll(optarg, &end, 0);
if (end == optarg || length < 1)
usage();
break;
case '?':
default:
usage();
}
}
argc -= optind;
argv += optind;
if (argc != 4 || bufcnt <= 0)
usage();
omode |= readonly ? O_RDONLY : O_RDWR;
parse_img_chunk(filepath);
bfd = open(argv[3], omode);
if (bfd == -1) {
perror("open");
exit(1);
}
shelf = atoi(argv[0]);
slot = atoi(argv[1]);
setserial(shelf, slot);
size = g_iso_file_size; //getsize(bfd);
size /= 512;
if (size <= offset) {
if (offset)
fprintf(stderr,
"Offset %lld too large for %lld-sector export\n",
offset,
size);
else
fputs("0-sector file size is too small\n", stderr);
exit(1);
}
size -= offset;
if (length) {
if (length > size) {
fprintf(stderr, "Length %llu too big - exceeds size of file!\n", offset);
exit(1);
}
size = length;
}
ifname = argv[2];
sfd = dial(ifname, bufcnt);
if (sfd < 0)
return 1;
getea(sfd, ifname, mac);
if (verbose) {
printf("pid %ld: e%d.%d, %lld sectors %s\n",
(long) getpid(), shelf, slot, size,
readonly ? "O_RDONLY" : "O_RDWR");
}
fflush(stdout);
atainit();
aoe();
return 0;
}

185
VBLADE/vblade-master/ata.c Normal file
View file

@ -0,0 +1,185 @@
// ata.c: ATA simulator for vblade
#include "config.h"
#include <string.h>
#include <stdio.h>
#include <sys/types.h>
#include "dat.h"
#include "fns.h"
enum {
// err bits
UNC = 1<<6,
MC = 1<<5,
IDNF = 1<<4,
MCR = 1<<3,
ABRT = 1<<2,
NM = 1<<1,
// status bits
BSY = 1<<7,
DRDY = 1<<6,
DF = 1<<5,
DRQ = 1<<3,
ERR = 1<<0,
};
static ushort ident[256];
static void
setfld(ushort *a, int idx, int len, char *str) // set field in ident
{
uchar *p;
p = (uchar *)(a+idx);
while (len > 0) {
if (*str == 0)
p[1] = ' ';
else
p[1] = *str++;
if (*str == 0)
p[0] = ' ';
else
p[0] = *str++;
p += 2;
len -= 2;
}
}
static void
setlba28(ushort *ident, vlong lba)
{
uchar *cp;
cp = (uchar *) &ident[60];
*cp++ = lba;
*cp++ = lba >>= 8;
*cp++ = lba >>= 8;
*cp++ = (lba >>= 8) & 0xf;
}
static void
setlba48(ushort *ident, vlong lba)
{
uchar *cp;
cp = (uchar *) &ident[100];
*cp++ = lba;
*cp++ = lba >>= 8;
*cp++ = lba >>= 8;
*cp++ = lba >>= 8;
*cp++ = lba >>= 8;
*cp++ = lba >>= 8;
}
static void
setushort(ushort *a, int i, ushort n)
{
uchar *p;
p = (uchar *)(a+i);
*p++ = n & 0xff;
*p++ = n >> 8;
}
void
atainit(void)
{
char buf[64];
setushort(ident, 47, 0x8000);
setushort(ident, 49, 0x0200);
setushort(ident, 50, 0x4000);
setushort(ident, 83, 0x5400);
setushort(ident, 84, 0x4000);
setushort(ident, 86, 0x1400);
setushort(ident, 87, 0x4000);
setushort(ident, 93, 0x400b);
setfld(ident, 27, 40, "Coraid EtherDrive vblade");
sprintf(buf, "V%d", VBLADE_VERSION);
setfld(ident, 23, 8, buf);
setfld(ident, 10, 20, serial);
}
/* The ATA spec is weird in that you specify the device size as number
* of sectors and then address the sectors with an offset. That means
* with LBA 28 you shouldn't see an LBA of all ones. Still, we don't
* check for that.
*/
int
atacmd(Ataregs *p, uchar *dp, int ndp, int payload) // do the ata cmd
{
vlong lba;
ushort *ip;
int n;
enum { MAXLBA28SIZE = 0x0fffffff };
extern int maxscnt;
p->status = 0;
switch (p->cmd) {
default:
p->status = DRDY | ERR;
p->err = ABRT;
return 0;
case 0xe7: // flush cache
return 0;
case 0xec: // identify device
if (p->sectors != 1 || ndp < 512)
return -1;
memmove(dp, ident, 512);
ip = (ushort *)dp;
if (size & ~MAXLBA28SIZE)
setlba28(ip, MAXLBA28SIZE);
else
setlba28(ip, size);
setlba48(ip, size);
p->err = 0;
p->status = DRDY;
p->sectors = 0;
return 0;
case 0xe5: // check power mode
p->err = 0;
p->sectors = 0xff; // the device is active or idle
p->status = DRDY;
return 0;
case 0x20: // read sectors
case 0x30: // write sectors
lba = p->lba & MAXLBA28SIZE;
break;
case 0x24: // read sectors ext
case 0x34: // write sectors ext
lba = p->lba & 0x0000ffffffffffffLL; // full 48
break;
}
// we ought not be here unless we are a read/write
if (p->sectors > maxscnt || p->sectors*512 > ndp)
return -1;
if (lba + p->sectors > size) {
p->err = IDNF;
p->status = DRDY | ERR;
p->lba = lba;
return 0;
}
if (p->cmd == 0x20 || p->cmd == 0x24)
n = getsec(bfd, dp, lba+offset, p->sectors);
else {
// packet should be big enough to contain the data
if (payload < 512 * p->sectors)
return -1;
n = putsec(bfd, dp, lba+offset, p->sectors);
}
n /= 512;
if (n != p->sectors) {
p->err = ABRT;
p->status = ERR;
} else
p->err = 0;
p->status |= DRDY;
p->lba += n;
p->sectors -= n;
return 0;
}

127
VBLADE/vblade-master/bpf.c Normal file
View file

@ -0,0 +1,127 @@
// bpf.c: bpf packet filter for linux/freebsd
#include "config.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include "dat.h"
#include "fns.h"
struct bpf_insn {
ushort code;
uchar jt;
uchar jf;
u_int32_t k;
};
struct bpf_program {
uint bf_len;
struct bpf_insn *bf_insns;
};
/* instruction classes */
#define BPF_CLASS(code) ((code) & 0x07)
#define BPF_LD 0x00
#define BPF_LDX 0x01
#define BPF_ST 0x02
#define BPF_STX 0x03
#define BPF_ALU 0x04
#define BPF_JMP 0x05
#define BPF_RET 0x06
#define BPF_MISC 0x07
/* ld/ldx fields */
#define BPF_SIZE(code) ((code) & 0x18)
#define BPF_W 0x00
#define BPF_H 0x08
#define BPF_B 0x10
#define BPF_MODE(code) ((code) & 0xe0)
#define BPF_IMM 0x00
#define BPF_ABS 0x20
#define BPF_IND 0x40
#define BPF_MEM 0x60
#define BPF_LEN 0x80
#define BPF_MSH 0xa0
/* alu/jmp fields */
#define BPF_OP(code) ((code) & 0xf0)
#define BPF_ADD 0x00
#define BPF_SUB 0x10
#define BPF_MUL 0x20
#define BPF_DIV 0x30
#define BPF_OR 0x40
#define BPF_AND 0x50
#define BPF_LSH 0x60
#define BPF_RSH 0x70
#define BPF_NEG 0x80
#define BPF_JA 0x00
#define BPF_JEQ 0x10
#define BPF_JGT 0x20
#define BPF_JGE 0x30
#define BPF_JSET 0x40
#define BPF_SRC(code) ((code) & 0x08)
#define BPF_K 0x00
#define BPF_X 0x08
/* ret - BPF_K and BPF_X also apply */
#define BPF_RVAL(code) ((code) & 0x18)
#define BPF_A 0x10
/* misc */
#define BPF_MISCOP(code) ((code) & 0xf8)
#define BPF_TAX 0x00
#define BPF_TXA 0x80
/* macros for insn array initializers */
#define BPF_STMT(code, k) { (ushort)(code), 0, 0, k }
#define BPF_JUMP(code, k, jt, jf) { (ushort)(code), jt, jf, k }
void *
create_bpf_program(int shelf, int slot)
{
struct bpf_program *bpf_program;
struct bpf_insn insns[] = {
/* CHECKTYPE: Load the type into register */
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
/* Does it match AoE Type (0x88a2)? No, goto INVALID */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x88a2, 0, 10),
/* Load the flags into register */
BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 14),
/* Check to see if the Resp flag is set */
BPF_STMT(BPF_ALU+BPF_AND+BPF_K, Resp),
/* Yes, goto INVALID */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0, 0, 7),
/* CHECKSHELF: Load the shelf number into register */
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 16),
/* Does it match shelf number? Yes, goto CHECKSLOT */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, shelf, 1, 0),
/* Does it match broadcast? No, goto INVALID */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0xffff, 0, 4),
/* CHECKSLOT: Load the slot number into register */
BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 18),
/* Does it match shelf number? Yes, goto VALID */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, slot, 1, 0),
/* Does it match broadcast? No, goto INVALID */
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0xff, 0, 1),
/* VALID: return -1 (allow the packet to be read) */
BPF_STMT(BPF_RET+BPF_K, -1),
/* INVALID: return 0 (ignore the packet) */
BPF_STMT(BPF_RET+BPF_K, 0),
};
if ((bpf_program = malloc(sizeof(struct bpf_program))) == NULL
|| (bpf_program->bf_insns = malloc(sizeof(insns))) == NULL) {
perror("malloc");
exit(1);
}
bpf_program->bf_len = sizeof(insns)/sizeof(struct bpf_insn);
memcpy(bpf_program->bf_insns, insns, sizeof(insns));
return (void *)bpf_program;
}
void
free_bpf_program(void *bpf_program)
{
free(((struct bpf_program *) bpf_program)->bf_insns);
free(bpf_program);
}

View file

@ -0,0 +1,14 @@
#!/bin/bash
rm -f vblade_*
gcc linux.c aoe.c ata.c bpf.c -Os -o vblade_64
gcc linux.c aoe.c ata.c bpf.c -Os -m32 -o vblade_32
if [ -e vblade_64 ] && [ -e vblade_32 ]; then
echo -e '\n################## SUCCESS ######################\n'
else
echo -e '\n################## FAILED ######################\n'
exit 1
fi

View file

@ -0,0 +1,2 @@
#define _FILE_OFFSET_BITS 64
typedef unsigned long long u64;

View file

@ -0,0 +1,2 @@
#define _FILE_OFFSET_BITS 64
//u64 typedef unsigned long long u64;

View file

@ -0,0 +1,8 @@
#include <stdio.h>
int main(void)
{
u64 n;
printf("%d\n", (int) n+2);
return 0;
}

View file

@ -0,0 +1,12 @@
The patches in the contrib directory enable features
that either don't work completely, aren't well tested, or
are of limited general use. They can be applied by
using patch in the vblade source directory as follows:
forfeit:~/vblade-12 # patch -p1 <contrib/jumbo.diff
patching file aoe.c
patching file fns.h
patching file freebsd.c
patching file linux.c
forfeit:~/vblade-12 #

View file

@ -0,0 +1,37 @@
#!/bin/sh
set -eu
SERVICEFILE="/lib/systemd/system/vblade@.service"
WANTDIR="$1/vblade.service.wants"
CONFIG_DIR=/etc/vblade.conf.d/
if [ -d "$CONFIG_DIR" ] ; then
mkdir -p "$WANTDIR"
cd "$CONFIG_DIR"
for CONFIG in *.conf ; do
[ -f "$CONFIG" ] || continue
INSTANCE="$(systemd-escape "${CONFIG%%.conf}")"
LINK="$WANTDIR/vblade@$INSTANCE.service"
sh -n "$CONFIG_DIR$CONFIG" 2>/dev/null || continue
shelf=
slot=
netif=
filename=
options=
. "$CONFIG_DIR$CONFIG"
[ "$netif" ] || continue
[ "$shelf" ] || continue
[ "$slot" ] || continue
[ "$filename" ] || continue
ln -s "$SERVICEFILE" "$LINK"
done
fi
exit 0

View file

@ -0,0 +1,107 @@
= VBLADE-PERSISTENCE(5)
== NAME
vblade-persistence - description of the vblade persistence
== DESCRIPTION
vblade-persistence uses the files in `/etc/vblade.conf.d/` to manage
exports. File names must end in `.conf`. The "instance" name is the
file name without `.conf`.
The file format is a POSIX shell fragment.
The following variables *must* be defined: `netif`, `shelf`, `slot`,
and `filename`. See vblade(8) for their meaning. Incomplete
configuration files are ignored, so are files that are not a valid
shell syntax.
Additionally, the following variables may be defined:
* `options`
Any options as provided by vblade(7).
* `ionice`
Use these to define an I/O scheduling class and level for that export.
The value must be understood by ionice(1).
== EXAMPLE
----
shelf=14
slot=2
netif=ens3
filename=/dev/mapper/export
options='-r -m 11:22:33:44:55:66,22:33:44:55:66:77 -o 8'
ionice='--class best-effort --classdata 7'
----
== USAGE
=== On systems using systemd
Install `vblade-generator` in `/lib/systemd/system-generators/`, and
both `vblade.service` and `vblade@.service` in `/lib/systemd/system/`.
Enable the vblade service, reload systemd. Additional units for each
export should appear, named `vblade@<instance>.service`.
=== On systems using SysV init
Individual instances may be controlled by providing their name as
a second option, e.g.
----
/etc/init.d/vblade status demo
----
Two different init scripts are available:
==== `vblade.init.lsb-daemon`
Uses LSB functions and daemon(1) program to control the instance.
Pros: daemon(1) is a very fine tool for this, providing also respawning
and output redirection.
==== `vblade.init.daemon`
As above, but without using LSB functions.
Pros: Should be fairly portable, no thrills.
==== Template
The template for these scripts is `vblade.init.in`, the actual
templating is done using tpage(1p), see `vblade.init.generate`.
Support for using Debian's start-stop-daemon has been prepared but
requires pid file supprt in vblade to be usable.
== BUGS
On SysV init systems, the configuration files are always sourced as
shell scripts. On systemd systems, the configuration file is just
a key/value store without shell expansion.
It's a wise idea to run `sh -n` against a configuration file after any
modification for basic format validation.
== SEE ALSO
daemon: <http://www.libslack.org/daemon/>
tpage(1p)
vblade(8)
== AUTHOR
Christoph Biedl <sourceforge.bnwi@manchmal.in-ulm.de>

View file

@ -0,0 +1,191 @@
#!/bin/sh
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="vblade export"
NAME=vblade
VBLADE="/usr/sbin/$NAME"
DAEMON=/usr/bin/daemon
IONICE=/usr/bin/ionice
PIDDIR="/var/run/vblade/"
[ -x "$VBLADE" ] || exit 0
[ -x "$DAEMON" ] || exit 0
mkdir -p "$PIDDIR"
# Emulation of LSB functions
VERBOSE=1
log_daemon_msg () {
printf '%s ' "$@"
}
log_end_msg () {
local CODE="$1"
if [ "$CODE" -eq 0 ] ; then
echo '.'
else
echo 'failed!'
fi
}
# Start a vblade instance
#
# Return
# 0 if daemon has been started
# 1 if daemon was already running
# 2 if daemon could not be started
do_start () {
local INSTANCE="$1"
local CONFIG="$2"
sh -n "$CONFIG" 2>/dev/null || return 2
shelf=
slot=
filename=
netif=
options=
ionice=
. "$CONFIG"
[ "$netif" ] || return 2
[ "$shelf" ] || return 2
[ "$slot" ] || return 2
[ "$filename" ] || return 2
if [ "$ionice" ] ; then
if [ -x "$IONICE" ] ; then
ionice="$IONICE $ionice"
else
ionice=
fi
fi
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
&& return 1
$ionice "$DAEMON" \
--respawn \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
--output daemon.notice \
--stdout daemon.notice \
--stderr daemon.err -- \
$VBLADE $options $shelf $slot $netif $filename || return 2
}
# Stop a vblade instance
#
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
do_stop () {
local INSTANCE="$1"
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" || return 1
"$DAEMON" \
--stop \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
--stop || return 2
# Wait until the process is gone
for i in $(seq 1 10) ; do
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" || return 0
done
return 2
}
EXIT=0
do_action () {
local CONFIG="$1"
INSTANCE="$(basename "${CONFIG%%.conf}")"
case "$ACTION" in
start)
[ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$INSTANCE"
do_start "$INSTANCE" "$CONFIG"
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$INSTANCE"
do_stop "$INSTANCE"
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
status)
if "$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR"
then
echo "$DESC instance $INSTANCE is running"
else
echo "$DESC instance $INSTANCE is not running"
EXIT=1
fi
;;
restart|force-reload)
log_daemon_msg "Restarting $DESC" "$INSTANCE"
do_stop "$INSTANCE"
case "$?" in
0|1)
do_start "$INSTANCE" "$CONFIG"
case "$?" in
0) log_end_msg 0 ;;
*)
# Old process is still running or
# failed to start
log_end_msg 1 ;;
esac
;;
*)
# Failed to stop
log_end_msg 1
;;
esac
;;
*)
echo "Usage: /etc/init.d/vblade {start|stop|status|restart|force-reload} [<export> ...]" >&2
exit 3
;;
esac
}
ACTION="$1"
shift
if [ "$1" ] ; then
while [ "$1" ] ; do
CONFIG="/etc/vblade.conf.d/$1.conf"
if [ -f "$CONFIG" ] ; then
do_action "$CONFIG"
fi
shift
done
else
for CONFIG in /etc/vblade.conf.d/*.conf ; do
if [ -f "$CONFIG" ] ; then
do_action "$CONFIG"
fi
done
fi
exit $EXIT

View file

@ -0,0 +1,24 @@
#!/bin/sh
set -e
TEMPDIR="$(mktemp --directory --tmpdir "vblade.init.generate.$$.XXXXX")"
trap "cd / ; rm -rf \"$TEMPDIR\"" EXIT
run () {
local OUTPUT="$1"
echo "I: Processing $OUTPUT"
TEMP="$TEMPDIR/$OUTPUT"
shift
tpage "$@" vblade.init.in>"$TEMP"
sh -n "$TEMP"
if [ -f "$OUTPUT" ] && cmp -s "$TEMP" "$OUTPUT" ; then
echo "I: $OUTPUT is fresh"
else
cp "$TEMP" "$OUTPUT"
fi
}
# run 'vblade.init.debian' --define lsb=1 --define control=ssd
run 'vblade.init.lsb-daemon' --define lsb=1 --define control=daemon
run 'vblade.init.daemon' --define lsb= --define control=daemon

View file

@ -0,0 +1,245 @@
#!/bin/sh
[% IF lsb -%]
### BEGIN INIT INFO
# Provides: vblade
# Required-Start: $remote_fs $syslog $network
# Required-Stop: $remote_fs $syslog $network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: vblade exports
# Description: Manage all vlbade exports defined in
# /etc/vblade.conf.d/
### END INIT INFO
[% END -%]
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="vblade export"
NAME=vblade
VBLADE="/usr/sbin/$NAME"
[% IF control == 'ssd' -%]
[% PERL -%]die ('control=ssd cannot be used as long as vblade as no pidfile support');[% END -%]
[% ELSIF control == 'daemon' -%]
DAEMON=/usr/bin/daemon
[% END -%]
IONICE=/usr/bin/ionice
PIDDIR="/var/run/vblade/"
[ -x "$VBLADE" ] || exit 0
[% IF control == 'daemon' -%]
[ -x "$DAEMON" ] || exit 0
[% END -%]
mkdir -p "$PIDDIR"
[% IF lsb -%]
# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh
# Define LSB functions
. /lib/lsb/init-functions
[% ELSE -%]
# Emulation of LSB functions
VERBOSE=1
log_daemon_msg () {
printf '%s ' "$@"
}
log_end_msg () {
local CODE="$1"
if [ "$CODE" -eq 0 ] ; then
echo '.'
else
echo 'failed!'
fi
}
[% END -%]
# Start a vblade instance
#
# Return
# 0 if daemon has been started
# 1 if daemon was already running
# 2 if daemon could not be started
do_start () {
local INSTANCE="$1"
local CONFIG="$2"
sh -n "$CONFIG" 2>/dev/null || return 2
shelf=
slot=
filename=
netif=
options=
ionice=
. "$CONFIG"
[ "$netif" ] || return 2
[ "$shelf" ] || return 2
[ "$slot" ] || return 2
[ "$filename" ] || return 2
if [ "$ionice" ] ; then
if [ -x "$IONICE" ] ; then
ionice="$IONICE $ionice"
else
ionice=
fi
fi
[% IF control == 'ssd' -%]
local PIDFILE="$PIDDIR/$INSTANCE.pid"
start-stop-daemon --start --quiet \
--pidfile "$PIDFILE" --exec "$VBLADE" --test > /dev/null \
|| return 1
start-stop-daemon --start --quiet \
--pidfile "$PIDFILE" \
--exec $ionice "$VBLADE" -- \
$shelf $slot $netif $filename $options \
|| return 2
[% ELSIF control == 'daemon' -%]
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
&& return 1
$ionice "$DAEMON" \
--respawn \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
--output daemon.notice \
--stdout daemon.notice \
--stderr daemon.err -- \
$VBLADE $options $shelf $slot $netif $filename || return 2
[% END -%]
}
# Stop a vblade instance
#
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
do_stop () {
local INSTANCE="$1"
[% IF control == 'ssd' -%]
local PIDFILE="$PIDDIR/$INSTANCE.pid"
start-stop-daemon --stop --quiet \
--retry=TERM/30/KILL/5 --pidfile "$PIDFILE" --name "$NAME"
RETVAL="$?"
[ "$RETVAL" = 2 ] && return 2
# Many daemons don't delete their pidfiles when they exit.
rm -f "$PIDFILE"
return "$RETVAL"
[% ELSIF control == 'daemon' -%]
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" || return 1
"$DAEMON" \
--stop \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
--stop || return 2
# Wait until the process is gone
for i in $(seq 1 10) ; do
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" || return 0
done
return 2
[% END -%]
}
EXIT=0
do_action () {
local CONFIG="$1"
INSTANCE="$(basename "${CONFIG%%.conf}")"
case "$ACTION" in
start)
[ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$INSTANCE"
do_start "$INSTANCE" "$CONFIG"
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$INSTANCE"
do_stop "$INSTANCE"
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
status)
[% IF lsb -%]
status_of_proc -p "$PIDDIR/$INSTANCE.pid" "$VBLADE" "vblade instance $INSTANCE" || EXIT=$?
[% ELSE -%]
if "$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR"
then
echo "$DESC instance $INSTANCE is running"
else
echo "$DESC instance $INSTANCE is not running"
EXIT=1
fi
[% END -%]
;;
restart|force-reload)
log_daemon_msg "Restarting $DESC" "$INSTANCE"
do_stop "$INSTANCE"
case "$?" in
0|1)
do_start "$INSTANCE" "$CONFIG"
case "$?" in
0) log_end_msg 0 ;;
*)
# Old process is still running or
# failed to start
log_end_msg 1 ;;
esac
;;
*)
# Failed to stop
log_end_msg 1
;;
esac
;;
*)
echo "Usage: /etc/init.d/vblade {start|stop|status|restart|force-reload} [<export> ...]" >&2
exit 3
;;
esac
}
ACTION="$1"
shift
if [ "$1" ] ; then
while [ "$1" ] ; do
CONFIG="/etc/vblade.conf.d/$1.conf"
if [ -f "$CONFIG" ] ; then
do_action "$CONFIG"
fi
shift
done
else
for CONFIG in /etc/vblade.conf.d/*.conf ; do
if [ -f "$CONFIG" ] ; then
do_action "$CONFIG"
fi
done
fi
exit $EXIT

View file

@ -0,0 +1,185 @@
#!/bin/sh
### BEGIN INIT INFO
# Provides: vblade
# Required-Start: $remote_fs $syslog $network
# Required-Stop: $remote_fs $syslog $network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: vblade exports
# Description: Manage all vlbade exports defined in
# /etc/vblade.conf.d/
### END INIT INFO
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="vblade export"
NAME=vblade
VBLADE="/usr/sbin/$NAME"
DAEMON=/usr/bin/daemon
IONICE=/usr/bin/ionice
PIDDIR="/var/run/vblade/"
[ -x "$VBLADE" ] || exit 0
[ -x "$DAEMON" ] || exit 0
mkdir -p "$PIDDIR"
# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh
# Define LSB functions
. /lib/lsb/init-functions
# Start a vblade instance
#
# Return
# 0 if daemon has been started
# 1 if daemon was already running
# 2 if daemon could not be started
do_start () {
local INSTANCE="$1"
local CONFIG="$2"
sh -n "$CONFIG" 2>/dev/null || return 2
shelf=
slot=
filename=
netif=
options=
ionice=
. "$CONFIG"
[ "$netif" ] || return 2
[ "$shelf" ] || return 2
[ "$slot" ] || return 2
[ "$filename" ] || return 2
if [ "$ionice" ] ; then
if [ -x "$IONICE" ] ; then
ionice="$IONICE $ionice"
else
ionice=
fi
fi
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
&& return 1
$ionice "$DAEMON" \
--respawn \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
--output daemon.notice \
--stdout daemon.notice \
--stderr daemon.err -- \
$VBLADE $options $shelf $slot $netif $filename || return 2
}
# Stop a vblade instance
#
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# 2 if daemon could not be stopped
# other if a failure occurred
do_stop () {
local INSTANCE="$1"
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" || return 1
"$DAEMON" \
--stop \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" \
--stop || return 2
# Wait until the process is gone
for i in $(seq 1 10) ; do
"$DAEMON" \
--running \
--name "$INSTANCE" \
--pidfiles "$PIDDIR" || return 0
done
return 2
}
EXIT=0
do_action () {
local CONFIG="$1"
INSTANCE="$(basename "${CONFIG%%.conf}")"
case "$ACTION" in
start)
[ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$INSTANCE"
do_start "$INSTANCE" "$CONFIG"
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$INSTANCE"
do_stop "$INSTANCE"
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
status)
status_of_proc -p "$PIDDIR/$INSTANCE.pid" "$VBLADE" "vblade instance $INSTANCE" || EXIT=$?
;;
restart|force-reload)
log_daemon_msg "Restarting $DESC" "$INSTANCE"
do_stop "$INSTANCE"
case "$?" in
0|1)
do_start "$INSTANCE" "$CONFIG"
case "$?" in
0) log_end_msg 0 ;;
*)
# Old process is still running or
# failed to start
log_end_msg 1 ;;
esac
;;
*)
# Failed to stop
log_end_msg 1
;;
esac
;;
*)
echo "Usage: /etc/init.d/vblade {start|stop|status|restart|force-reload} [<export> ...]" >&2
exit 3
;;
esac
}
ACTION="$1"
shift
if [ "$1" ] ; then
while [ "$1" ] ; do
CONFIG="/etc/vblade.conf.d/$1.conf"
if [ -f "$CONFIG" ] ; then
do_action "$CONFIG"
fi
shift
done
else
for CONFIG in /etc/vblade.conf.d/*.conf ; do
if [ -f "$CONFIG" ] ; then
do_action "$CONFIG"
fi
done
fi
exit $EXIT

View file

@ -0,0 +1,13 @@
[Unit]
Description=vblade exports
Documentation=man:vblade-persistence(5)
Documentation=man:vblade(8)
[Service]
Type=oneshot
ExecStart=/bin/true
ExecReload=/bin/true
RemainAfterExit=on
[Install]
WantedBy=multi-user.target

View file

@ -0,0 +1,18 @@
[Unit]
Description=vblade instance %I
SourcePath=/etc/vblade.conf.d/%I.conf
Documentation=man:vblade(8)
PartOf=vblade.service
After=rc-local.service
[Service]
Type=simple
Environment="ionice=-c2 -n7"
EnvironmentFile=/etc/vblade.conf.d/%I.conf
ExecStart=/usr/bin/ionice $ionice /usr/sbin/vblade $shelf $slot $netif $filename $options
SyslogIdentifier=vblade
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target

View file

@ -0,0 +1,31 @@
This proof-of-concept patch modifies vblade to access the underlying block
device using POSIX asynchronous IO (AIO) rather than using normal blocking
read() and write(). AIO allows vblade to receive and queue several several ATA
read/write commands at once, returning the response to the client
asynchronously as each IO operation completes. It should be most beneficial
for devices which experience very non-sequential IO. An AIO-enabled vblade is
also a good starting point if you want to generalise vblade to export multiple
devices without the complexity and overhead of a multithreaded approach.
The patch implements AIO support for both Linux and FreeBSD, but I have not
tested the FreeBSD support and would therefore be especially interested to
hear success/failure reports for compiling and running AIO vblade on FreeBSD.
A SIGIO handler which writes a single byte to a pipe is used to notify the
main poll() loop that AIO operations have completed and are ready to return to
the client. Running oprofile on a box with a heavily loaded loopback
vblade-aio suggests that it spends an inordinate amount of time in the signal
handler. Some method of poll()ing directly on the AIO events at the same time
as the socket fd could cut this overhead out completely.
More generally, experimenting on Linux with standard O_DIRECT vblade and
O_DIRECT vblade-aio on a loopback interface with MTU 9000 suggests that the
performance difference on a single RAID1-backed block device is fairly small:
swamped by the performance of the network and the underlying block device.
However, the POSIX AIO in glibc librt is emulated in userspace threads rather
than using the kernel AIO api. A kernel-backed POSIX AIO implementation should
perform better, especially for multiple access to a single block device.
I would be delighted to hear any feedback and experiences from people running
vblade together with this patch.
Chris Webb <chris@arachsys.com>, 2008-04-21.

View file

@ -0,0 +1,538 @@
diff -uprN vblade-17.orig/aoe.c vblade-17/aoe.c
--- vblade-17.orig/aoe.c 2008-06-09 10:53:07.000000000 -0400
+++ vblade-17/aoe.c 2008-06-09 11:05:23.000000000 -0400
@@ -8,6 +8,9 @@
#include <sys/stat.h>
#include <fcntl.h>
#include <netinet/in.h>
+#include <errno.h>
+#include <aio.h>
+#include <poll.h>
#include "dat.h"
#include "fns.h"
@@ -22,6 +25,11 @@ char config[Nconfig];
int nconfig = 0;
int maxscnt = 2;
char *ifname;
+int queuepipe[2];
+int pktlen[Nplaces], pending[Nplaces];
+Ata *pkt[Nplaces];
+Ataregs regs[Nplaces];
+struct aiocb aiocb[Nplaces];
void
aoead(int fd) // advertise the virtual blade
@@ -78,32 +86,52 @@ getlba(uchar *p)
}
int
-aoeata(Ata *p, int pktlen) // do ATA reqeust
+aoeata(int place) // do ATA reqeust
{
- Ataregs r;
- int len = 60;
int n;
+ int len = 60; // minimum ethernet packet size
- r.lba = getlba(p->lba);
- r.sectors = p->sectors;
- r.feature = p->err;
- r.cmd = p->cmd;
- if (atacmd(&r, (uchar *)(p+1), maxscnt*512, pktlen - sizeof(*p)) < 0) {
- p->h.flags |= Error;
- p->h.error = BadArg;
+ regs[place].lba = getlba(pkt[place]->lba);
+ regs[place].sectors = pkt[place]->sectors;
+ regs[place].feature = pkt[place]->err;
+ regs[place].cmd = pkt[place]->cmd;
+ n = atacmd(regs + place, (uchar *)(pkt[place] + 1), maxscnt*512,
+ pktlen[place] - sizeof(Ata), aiocb + place);
+ if (n < 0) {
+ pkt[place]->h.flags |= Error;
+ pkt[place]->h.error = BadArg;
return len;
+ } else if (n > 0) {
+ pending[place] = 1;
+ return 0;
+ }
+ if (!(pkt[place]->aflag & Write) && (n = pkt[place]->sectors)) {
+ n -= regs[place].sectors;
+ len = sizeof (Ata) + (n*512);
}
- if (!(p->aflag & Write))
- if ((n = p->sectors)) {
- n -= r.sectors;
+ pkt[place]->sectors = regs[place].sectors;
+ pkt[place]->err = regs[place].err;
+ pkt[place]->cmd = regs[place].status;
+ return len;
+}
+
+int aoeatacomplete(int place, int pktlen)
+{
+ int n;
+ int len = 60; // minimum ethernet packet size
+ atacmdcomplete(regs + place, aiocb + place);
+ if (!(pkt[place]->aflag & Write) && (n = pkt[place]->sectors)) {
+ n -= regs[place].sectors;
len = sizeof (Ata) + (n*512);
}
- p->sectors = r.sectors;
- p->err = r.err;
- p->cmd = r.status;
+ pkt[place]->sectors = regs[place].sectors;
+ pkt[place]->err = regs[place].err;
+ pkt[place]->cmd = regs[place].status;
+ pending[place] = 0;
return len;
}
+
#define QCMD(x) ((x)->vercmd & 0xf)
// yes, this makes unnecessary copies.
@@ -156,8 +184,9 @@ confcmd(Conf *p, int payload) // process
}
void
-doaoe(Aoehdr *p, int n)
+doaoe(int place)
{
+ Aoehdr *p = (Aoehdr *) pkt[place];
int len;
enum { // config query header size
CHDR_SIZ = sizeof(Conf) - sizeof(((Conf *)0)->data),
@@ -165,14 +194,16 @@ doaoe(Aoehdr *p, int n)
switch (p->cmd) {
case ATAcmd:
- if (n < sizeof(Ata))
+ if (pktlen[place] < sizeof(Ata))
+ return;
+ len = aoeata(place);
+ if (len == 0)
return;
- len = aoeata((Ata*)p, n);
break;
case Config:
- if (n < CHDR_SIZ)
+ if (pktlen[place] < CHDR_SIZ)
return;
- len = confcmd((Conf *)p, n - CHDR_SIZ);
+ len = confcmd((Conf *)p, pktlen[place] - CHDR_SIZ);
if (len == 0)
return;
break;
@@ -193,25 +224,129 @@ doaoe(Aoehdr *p, int n)
}
void
+doaoecomplete(int place)
+{
+ Aoehdr *p = (Aoehdr *) pkt[place];
+ int len = aoeatacomplete(place, pktlen[place]);
+ memmove(p->dst, p->src, 6);
+ memmove(p->src, mac, 6);
+ p->maj = htons(shelf);
+ p->min = slot;
+ p->flags |= Resp;
+ if (putpkt(sfd, (uchar *) p, len) == -1) {
+ perror("write to network");
+ exit(1);
+ }
+
+}
+
+// allocate the buffer so that the ata data area
+// is page aligned for o_direct on linux
+
+void *
+bufalloc(void **buf, long len)
+{
+ long psize;
+ unsigned long n;
+
+ psize = sysconf(_SC_PAGESIZE);
+ if (psize == -1) {
+ perror("sysconf");
+ exit(EXIT_FAILURE);
+ }
+ n = len/psize + 3;
+ *buf = malloc(psize * n);
+ if (!*buf) {
+ perror("malloc");
+ exit(EXIT_FAILURE);
+ }
+ n = (unsigned long) *buf;
+ n += psize * 2;
+ n &= ~(psize - 1);
+ return (void *) (n - sizeof (Ata));
+}
+
+void
+sigio(int signo)
+{
+ const char dummy = 0;
+ write(queuepipe[1], &dummy, 1);
+}
+
+void
aoe(void)
{
Aoehdr *p;
- uchar *buf;
- int n, sh;
+ char dummy;
+ int n, place, sh;
enum { bufsz = 1<<16, };
-
- buf = malloc(bufsz);
+ sigset_t mask, oldmask;
+ struct sigaction sigact;
+ struct pollfd pollfds[2];
+ void *freeme[Nplaces];
+
+ for (n = 0; n < Nplaces; n++) {
+ pkt[n] = bufalloc(freeme + n, bufsz);
+ pending[n] = 0;
+ }
aoead(sfd);
+ pipe(queuepipe);
+ fcntl(queuepipe[0], F_SETFL, O_NONBLOCK);
+ fcntl(queuepipe[1], F_SETFL, O_NONBLOCK);
+
+ sigemptyset(&sigact.sa_mask);
+ sigact.sa_flags = 0;
+ sigact.sa_sigaction = (void *) sigio;
+ sigaction(SIGIO, &sigact, NULL);
+
+ sigemptyset(&mask);
+ sigaddset(&mask, SIGIO);
+ sigprocmask(SIG_BLOCK, &mask, &oldmask);
+
+ pollfds[0].fd = queuepipe[0];
+ pollfds[1].fd = sfd;
+ pollfds[0].events = pollfds[1].events = POLLIN;
+
for (;;) {
- n = getpkt(sfd, buf, bufsz);
- if (n < 0) {
+ sigprocmask(SIG_SETMASK, &oldmask, NULL);
+ n = poll(pollfds, 2, 1000);
+ sigprocmask(SIG_BLOCK, &mask, NULL);
+
+ if (n < 0 && errno != EINTR) {
+ perror("poll");
+ continue;
+ } else if (n == 0 || pollfds[0].revents & POLLIN) {
+ while(read(queuepipe[0], &dummy, 1) > 0);
+ for (place = 0; place < Nplaces; place++) {
+ if (!pending[place])
+ continue;
+ if (aio_error(aiocb + place) == EINPROGRESS)
+ continue;
+ doaoecomplete(place);
+ pollfds[1].events = POLLIN;
+ }
+ }
+
+ if ((pollfds[1].revents & POLLIN) == 0)
+ continue;
+
+ for (place = 0; pending[place] && place < Nplaces; place++);
+ if (place >= Nplaces) {
+ pollfds[1].events = 0;
+ continue;
+ }
+
+ pktlen[place] = getpkt(sfd, (uchar *) pkt[place], bufsz);
+ if (pktlen[place] < 0) {
+ if (errno == EINTR)
+ continue;
perror("read network");
exit(1);
}
- if (n < sizeof(Aoehdr))
+ if (pktlen[place] < sizeof(Aoehdr))
continue;
- p = (Aoehdr *) buf;
+ p = (Aoehdr *) pkt[place];
if (ntohs(p->type) != 0x88a2)
continue;
if (p->flags & Resp)
@@ -223,9 +358,10 @@ aoe(void)
continue;
if (nmasks && !maskok(p->src))
continue;
- doaoe(p, n);
+ doaoe(place);
}
- free(buf);
+ for (place = 0; place < Nplaces; place++)
+ free(freeme[place]);
}
void
@@ -317,7 +453,7 @@ main(int argc, char **argv)
}
if (s.st_mode & (S_IWUSR|S_IWGRP|S_IWOTH))
omode = O_RDWR;
- bfd = open(argv[3], omode);
+ bfd = opendisk(argv[3], omode);
if (bfd == -1) {
perror("open");
exit(1);
diff -uprN vblade-17.orig/ata.c vblade-17/ata.c
--- vblade-17.orig/ata.c 2008-06-09 10:53:07.000000000 -0400
+++ vblade-17/ata.c 2008-06-09 11:05:23.000000000 -0400
@@ -3,6 +3,8 @@
#include <string.h>
#include <stdio.h>
#include <sys/types.h>
+#include <errno.h>
+#include <aio.h>
#include "dat.h"
#include "fns.h"
@@ -98,7 +100,7 @@ atainit(void)
* check for that.
*/
int
-atacmd(Ataregs *p, uchar *dp, int ndp, int payload) // do the ata cmd
+atacmd(Ataregs *p, uchar *dp, int ndp, int payload, struct aiocb *aiocb) // do the ata cmd
{
vlong lba;
ushort *ip;
@@ -155,14 +157,29 @@ atacmd(Ataregs *p, uchar *dp, int ndp, i
return 0;
}
if (p->cmd == 0x20 || p->cmd == 0x24)
- n = getsec(bfd, dp, lba, p->sectors);
+ n = getsec(bfd, dp, lba, p->sectors, aiocb);
else {
// packet should be big enough to contain the data
if (payload < 512 * p->sectors)
return -1;
- n = putsec(bfd, dp, lba, p->sectors);
+ n = putsec(bfd, dp, lba, p->sectors, aiocb);
}
- n /= 512;
+ if (n < 0) {
+ p->err = ABRT;
+ p->status = ERR|DRDY;
+ p->lba += n;
+ p->sectors -= n;
+ return 0;
+ }
+ return 1; // callback expected
+}
+
+
+int
+atacmdcomplete(Ataregs *p, struct aiocb *aiocb) // complete the ata cmd
+{
+ int n;
+ n = aio_return(aiocb) / 512;
if (n != p->sectors) {
p->err = ABRT;
p->status = ERR;
@@ -173,4 +190,3 @@ atacmd(Ataregs *p, uchar *dp, int ndp, i
p->sectors -= n;
return 0;
}
-
diff -uprN vblade-17.orig/dat.h vblade-17/dat.h
--- vblade-17.orig/dat.h 2008-06-09 10:53:07.000000000 -0400
+++ vblade-17/dat.h 2008-06-09 11:05:23.000000000 -0400
@@ -111,6 +111,8 @@ enum {
Nconfig = 1024,
Bufcount = 16,
+
+ Nplaces = 32,
};
int shelf, slot;
diff -uprN vblade-17.orig/fns.h vblade-17/fns.h
--- vblade-17.orig/fns.h 2008-06-09 10:53:07.000000000 -0400
+++ vblade-17/fns.h 2008-06-09 11:07:21.000000000 -0400
@@ -15,7 +15,8 @@ int maskok(uchar *);
// ata.c
void atainit(void);
-int atacmd(Ataregs *, uchar *, int, int);
+int atacmd(Ataregs *, uchar *, int, int, struct aiocb *);
+int atacmdcomplete(Ataregs *, struct aiocb *);
// bpf.c
@@ -26,8 +27,9 @@ void free_bpf_program(void *);
int dial(char *);
int getea(int, char *, uchar *);
-int putsec(int, uchar *, vlong, int);
-int getsec(int, uchar *, vlong, int);
+int opendisk(const char *, int);
+int putsec(int, uchar *, vlong, int, struct aiocb *);
+int getsec(int, uchar *, vlong, int, struct aiocb *);
int putpkt(int, uchar *, int);
int getpkt(int, uchar *, int);
vlong getsize(int);
diff -uprN vblade-17.orig/freebsd.c vblade-17/freebsd.c
--- vblade-17.orig/freebsd.c 2008-06-09 10:53:07.000000000 -0400
+++ vblade-17/freebsd.c 2008-06-09 11:05:23.000000000 -0400
@@ -209,19 +209,40 @@ getea(int s, char *eth, uchar *ea)
return(0);
}
-
int
-getsec(int fd, uchar *place, vlong lba, int nsec)
+opendisk(const char *disk, int omode)
{
- return pread(fd, place, nsec * 512, lba * 512);
+ return open(disk, omode);
}
int
-putsec(int fd, uchar *place, vlong lba, int nsec)
-{
- return pwrite(fd, place, nsec * 512, lba * 512);
+getsec(int fd, uchar *place, vlong lba, int nsec, struct aiocb *aiocb)
+{
+ bzero((char *) aiocb, sizeof(struct aiocb));
+ aiocb->aio_fildes = fd;
+ aiocb->aio_buf = place;
+ aiocb->aio_nbytes = nsec * 512;
+ aiocb->aio_offset = lba * 512;
+ aiocb->aio_sigevent.sigev_notify = SIGEV_SIGNAL;
+ aiocb->aio_sigevent.sigev_signo = SIGIO;
+ aiocb->aio_sigevent.sigev_value.sival_ptr = aiocb;
+ return aio_read(aiocb);
}
+int
+putsec(int fd, uchar *place, vlong lba, int nsec, struct aiocb *aiocb)
+{
+ bzero((char *) aiocb, sizeof(struct aiocb));
+ aiocb->aio_fildes = fd;
+ aiocb->aio_buf = place;
+ aiocb->aio_nbytes = nsec * 512;
+ aiocb->aio_offset = lba * 512;
+ aiocb->aio_sigevent.sigev_notify = SIGEV_SIGNAL;
+ aiocb->aio_sigevent.sigev_signo = SIGIO;
+ aiocb->aio_sigevent.sigev_value.sival_ptr = aiocb;
+ return aio_write(aiocb);
+}
+
static int pktn = 0;
static uchar *pktbp = NULL;
diff -uprN vblade-17.orig/linux.c vblade-17/linux.c
--- vblade-17.orig/linux.c 2008-06-09 10:53:07.000000000 -0400
+++ vblade-17/linux.c 2008-06-09 11:05:23.000000000 -0400
@@ -1,5 +1,6 @@
// linux.c: low level access routines for Linux
#include "config.h"
+#define _GNU_SOURCE
#include <sys/socket.h>
#include <stdio.h>
#include <string.h>
@@ -22,6 +23,9 @@
#include <netinet/in.h>
#include <linux/fs.h>
#include <sys/stat.h>
+#include <fcntl.h>
+#include <errno.h>
+#include <aio.h>
#include "dat.h"
#include "fns.h"
@@ -29,8 +33,6 @@
int getindx(int, char *);
int getea(int, char *, uchar *);
-
-
int
dial(char *eth) // get us a raw connection to an interface
{
@@ -84,7 +86,7 @@ getea(int s, char *name, uchar *ea)
struct ifreq xx;
int n;
- strcpy(xx.ifr_name, name);
+ strcpy(xx.ifr_name, name);
n = ioctl(s, SIOCGIFHWADDR, &xx);
if (n == -1) {
perror("Can't get hw addr");
@@ -110,17 +112,37 @@ getmtu(int s, char *name)
}
int
-getsec(int fd, uchar *place, vlong lba, int nsec)
+opendisk(const char *disk, int omode)
+{
+ return open(disk, omode|O_DIRECT);
+}
+
+int
+getsec(int fd, uchar *place, vlong lba, int nsec, struct aiocb *aiocb)
{
- lseek(fd, lba * 512, 0);
- return read(fd, place, nsec * 512);
+ bzero((char *) aiocb, sizeof(struct aiocb));
+ aiocb->aio_fildes = fd;
+ aiocb->aio_buf = place;
+ aiocb->aio_nbytes = nsec * 512;
+ aiocb->aio_offset = lba * 512;
+ aiocb->aio_sigevent.sigev_notify = SIGEV_SIGNAL;
+ aiocb->aio_sigevent.sigev_signo = SIGIO;
+ aiocb->aio_sigevent.sigev_value.sival_ptr = aiocb;
+ return aio_read(aiocb);
}
int
-putsec(int fd, uchar *place, vlong lba, int nsec)
+putsec(int fd, uchar *place, vlong lba, int nsec, struct aiocb *aiocb)
{
- lseek(fd, lba * 512, 0);
- return write(fd, place, nsec * 512);
+ bzero((char *) aiocb, sizeof(struct aiocb));
+ aiocb->aio_fildes = fd;
+ aiocb->aio_buf = place;
+ aiocb->aio_nbytes = nsec * 512;
+ aiocb->aio_offset = lba * 512;
+ aiocb->aio_sigevent.sigev_notify = SIGEV_SIGNAL;
+ aiocb->aio_sigevent.sigev_signo = SIGIO;
+ aiocb->aio_sigevent.sigev_value.sival_ptr = aiocb;
+ return aio_write(aiocb);
}
int
diff -uprN vblade-17.orig/linux.h vblade-17/linux.h
--- vblade-17.orig/linux.h 2008-06-09 10:53:07.000000000 -0400
+++ vblade-17/linux.h 2008-06-09 11:05:23.000000000 -0400
@@ -6,6 +6,6 @@ typedef long long vlong;
int dial(char *);
int getindx(int, char *);
int getea(int, char *, uchar *);
-int getsec(int, uchar *, vlong, int);
-int putsec(int, uchar *, vlong, int);
+int getsec(int, uchar *, vlong, int, struct aiocb *);
+int putsec(int, uchar *, vlong, int, struct aiocb *);
vlong getsize(int);
diff -uprN vblade-17.orig/makefile vblade-17/makefile
--- vblade-17.orig/makefile 2008-06-09 10:53:07.000000000 -0400
+++ vblade-17/makefile 2008-06-09 11:05:23.000000000 -0400
@@ -13,7 +13,7 @@ CFLAGS += -Wall -g -O2
CC = gcc
vblade: $O
- ${CC} -o vblade $O
+ ${CC} -lrt -o vblade $O
aoe.o : aoe.c config.h dat.h fns.h makefile
${CC} ${CFLAGS} -c $<

174
VBLADE/vblade-master/dat.h Normal file
View file

@ -0,0 +1,174 @@
/* dat.h: include file for vblade AoE target */
#define nil ((void *)0)
/*
* tunable variables
*/
enum {
VBLADE_VERSION = 24,
// Firmware version
FWV = 0x4000 + VBLADE_VERSION,
};
#undef major
#undef minor
#undef makedev
#define major(x) ((x) >> 24 & 0xFF)
#define minor(x) ((x) & 0xffffff)
#define makedev(x, y) ((x) << 24 | (y))
typedef unsigned char uchar;
//typedef unsigned short ushort;
#ifdef __FreeBSD__
typedef unsigned long ulong;
#else
//typedef unsigned long ulong;
#endif
typedef long long vlong;
typedef struct Aoehdr Aoehdr;
typedef struct Ata Ata;
typedef struct Conf Conf;
typedef struct Ataregs Ataregs;
typedef struct Mdir Mdir;
typedef struct Aoemask Aoemask;
typedef struct Aoesrr Aoesrr;
struct Ataregs
{
vlong lba;
uchar cmd;
uchar status;
uchar err;
uchar feature;
uchar sectors;
};
struct Aoehdr
{
uchar dst[6];
uchar src[6];
ushort type;
uchar flags;
uchar error;
ushort maj;
uchar min;
uchar cmd;
uchar tag[4];
};
struct Ata
{
Aoehdr h;
uchar aflag;
uchar err;
uchar sectors;
uchar cmd;
uchar lba[6];
uchar resvd[2];
};
struct Conf
{
Aoehdr h;
ushort bufcnt;
ushort firmware;
uchar scnt;
uchar vercmd;
ushort len;
uchar data[1024];
};
// mask directive
struct Mdir {
uchar res;
uchar cmd;
uchar mac[6];
};
struct Aoemask {
Aoehdr h;
uchar res;
uchar cmd;
uchar merror;
uchar nmacs;
// struct Mdir m[0];
};
struct Aoesrr {
Aoehdr h;
uchar rcmd;
uchar nmacs;
// uchar mac[6][nmacs];
};
enum {
AoEver = 1,
ATAcmd = 0, // command codes
Config,
Mask,
Resrel,
Resp = (1<<3), // flags
Error = (1<<2),
BadCmd = 1,
BadArg,
DevUnavailable,
ConfigErr,
BadVersion,
Res,
Write = (1<<0),
Async = (1<<1),
Device = (1<<4),
Extend = (1<<6),
Qread = 0,
Qtest,
Qprefix,
Qset,
Qfset,
Nretries = 3,
Nconfig = 1024,
Bufcount = 16,
/* mask commands */
Mread= 0,
Medit,
/* mask directives */
MDnop= 0,
MDadd,
MDdel,
/* mask errors */
MEunspec= 1,
MEbaddir,
MEfull,
/* header sizes, including aoe hdr */
Naoehdr= 24,
Natahdr= Naoehdr + 12,
Ncfghdr= Naoehdr + 8,
Nmaskhdr= Naoehdr + 4,
Nsrrhdr= Naoehdr + 2,
Nserial= 20,
};
int shelf, slot;
ulong aoetag;
uchar mac[6];
int bfd; // block file descriptor
int sfd; // socket file descriptor
vlong size; // size of vblade
vlong offset;
char *progname;
char serial[Nserial+1];

View file

@ -0,0 +1,35 @@
// fns.h: function prototypes
// aoe.c
void aoe(void);
void aoeinit(void);
void aoequery(void);
void aoeconfig(void);
void aoead(int);
void aoeflush(int, int);
void aoetick(void);
void aoerequest(int, int, vlong, int, uchar *, int);
int maskok(uchar *);
int rrok(uchar *);
// ata.c
void atainit(void);
int atacmd(Ataregs *, uchar *, int, int);
// bpf.c
void * create_bpf_program(int, int);
void free_bpf_program(void *);
// os specific
int dial(char *, int);
int getea(int, char *, uchar *);
int putsec(int, uchar *, vlong, int);
int getsec(int, uchar *, vlong, int);
int putpkt(int, uchar *, int);
int getpkt(int, uchar *, int);
vlong getsize(int);
int getmtu(int, char *);

View file

@ -0,0 +1,313 @@
/*
* Copyright (c) 2005, Stacey Son <sson (at) verio (dot) net>
* All rights reserved.
*/
// freebsd.c: low level access routines for FreeBSD
#include "config.h"
#include <sys/types.h>
#include <sys/socket.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>
#include <netinet/in.h>
#include <net/ethernet.h>
#include <net/bpf.h>
#include <net/if.h>
#include <net/if_arp.h>
#include <net/if_dl.h>
#include <net/route.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <net/if.h>
#include <sys/stat.h>
#include <sys/disk.h>
#include <sys/select.h>
#include <sys/sysctl.h>
#include <fcntl.h>
#include <errno.h>
#include "dat.h"
#include "fns.h"
#define BPF_DEV "/dev/bpf0"
/* Packet buffer for getpkt() */
static uchar *pktbuf = NULL;
static int pktbufsz = 0;
int
dial(char *eth, int bufcnt)
{
char m;
int fd = -1;
struct bpf_version bv;
u_int v;
unsigned bufsize, linktype;
char device[sizeof BPF_DEV];
struct ifreq ifr;
struct bpf_program *bpf_program = create_bpf_program(shelf, slot);
strncpy(device, BPF_DEV, sizeof BPF_DEV);
/* find a bpf device we can use, check /dev/bpf[0-9] */
for (m = '0'; m <= '9'; m++) {
device[sizeof(BPF_DEV)-2] = m;
if ((fd = open(device, O_RDWR)) > 0)
break;
}
if (fd < 0) {
perror("open");
return -1;
}
if (ioctl(fd, BIOCVERSION, &bv) < 0) {
perror("BIOCVERSION");
goto bad;
}
if (bv.bv_major != BPF_MAJOR_VERSION ||
bv.bv_minor < BPF_MINOR_VERSION) {
fprintf(stderr,
"kernel bpf filter out of date\n");
goto bad;
}
/*
* Try finding a good size for the buffer; 65536 may be too
* big, so keep cutting it in half until we find a size
* that works, or run out of sizes to try.
*
*/
for (v = 65536; v != 0; v >>= 1) {
(void) ioctl(fd, BIOCSBLEN, (caddr_t)&v);
(void)strncpy(ifr.ifr_name, eth,
sizeof(ifr.ifr_name));
if (ioctl(fd, BIOCSETIF, (caddr_t)&ifr) >= 0)
break; /* that size worked; we're done */
if (errno != ENOBUFS) {
fprintf(stderr, "BIOCSETIF: %s: %s\n",
eth, strerror(errno));
goto bad;
}
}
if (v == 0) {
fprintf(stderr,
"BIOCSBLEN: %s: No buffer size worked\n", eth);
goto bad;
}
/* Allocate memory for the packet buffer */
pktbufsz = v;
if ((pktbuf = malloc(pktbufsz)) == NULL) {
perror("malloc");
goto bad;
}
/* Don't wait for buffer to be full or timeout */
v = 1;
if (ioctl(fd, BIOCIMMEDIATE, &v) < 0) {
perror("BIOCIMMEDIATE");
goto bad;
}
/* Only read incoming packets */
v = 0;
if (ioctl(fd, BIOCSSEESENT, &v) < 0) {
perror("BIOCSSEESENT");
goto bad;
}
/* Don't complete ethernet hdr */
v = 1;
if (ioctl(fd, BIOCSHDRCMPLT, &v) < 0) {
perror("BIOCSHDRCMPLT");
goto bad;
}
/* Get the data link layer type. */
if (ioctl(fd, BIOCGDLT, (caddr_t)&v) < 0) {
perror("BIOCGDLT");
goto bad;
}
linktype = v;
/* Get the filter buf size */
if (ioctl(fd, BIOCGBLEN, (caddr_t)&v) < 0) {
perror("BIOCGBLEN");
goto bad;
}
bufsize = v;
if (ioctl(fd, BIOCSETF, (caddr_t)bpf_program) < 0) {
perror("BIOSETF");
goto bad;
}
free_bpf_program(bpf_program);
return(fd);
bad:
free_bpf_program(bpf_program);
close(fd);
return(-1);
}
int
getea(int s, char *eth, uchar *ea)
{
int mib[6];
size_t len;
char *buf, *next, *end;
struct if_msghdr *ifm;
struct sockaddr_dl *sdl;
mib[0] = CTL_NET; mib[1] = AF_ROUTE;
mib[2] = 0; mib[3] = AF_LINK;
mib[4] = NET_RT_IFLIST; mib[5] = 0;
if (sysctl(mib, 6, NULL, &len, NULL, 0) < 0) {
return (-1);
}
if (!(buf = (char *) malloc(len))) {
return (-1);
}
if (sysctl(mib, 6, buf, &len, NULL, 0) < 0) {
free(buf);
return (-1);
}
end = buf + len;
for (next = buf; next < end; next += ifm->ifm_msglen) {
ifm = (struct if_msghdr *)next;
if (ifm->ifm_type == RTM_IFINFO) {
sdl = (struct sockaddr_dl *)(ifm + 1);
if (strncmp(&sdl->sdl_data[0], eth,
sdl->sdl_nlen) == 0) {
memcpy(ea, LLADDR(sdl), ETHER_ADDR_LEN);
break;
}
}
}
free(buf);
return(0);
}
#if 0
int
getsec(int fd, uchar *place, vlong lba, int nsec)
{
return pread(fd, place, nsec * 512, lba * 512);
}
int
putsec(int fd, uchar *place, vlong lba, int nsec)
{
return pwrite(fd, place, nsec * 512, lba * 512);
}
#endif
static int pktn = 0;
static uchar *pktbp = NULL;
int
getpkt(int fd, uchar *buf, int sz)
{
register struct bpf_hdr *bh;
register int pktlen, retlen;
if (pktn <= 0) {
if ((pktn = read(fd, pktbuf, pktbufsz)) < 0) {
perror("read");
exit(1);
}
pktbp = pktbuf;
}
bh = (struct bpf_hdr *) pktbp;
retlen = (int) bh->bh_caplen;
/* This memcpy() is currently needed */
memcpy(buf, (void *)(pktbp + bh->bh_hdrlen),
retlen > sz ? sz : retlen);
pktlen = bh->bh_hdrlen + bh->bh_caplen;
pktbp = pktbp + BPF_WORDALIGN(pktlen);
pktn -= (int) BPF_WORDALIGN(pktlen);
return retlen;
}
int
putpkt(int fd, uchar *buf, int sz)
{
return write(fd, buf, sz);
}
int
getmtu(int fd, char *name)
{
struct ifreq xx;
int s, n, p;
s = socket(AF_INET, SOCK_RAW, 0);
if (s == -1) {
perror("Can't get mtu");
return 1500;
}
xx.ifr_addr.sa_family = AF_INET;
snprintf(xx.ifr_name, sizeof xx.ifr_name, "%s", name);
n = ioctl(s, SIOCGIFMTU, &xx);
if (n == -1) {
perror("Can't get mtu");
return 1500;
}
close(s);
// FreeBSD bpf writes are capped at one PAGESIZE'd mbuf. As such we must
// limit our sector count. See FreeBSD PR 205164, OpenAoE/vblade #7.
p = getpagesize();
if (xx.ifr_mtu > p) {
return p;
}
return xx.ifr_mtu;
}
vlong
getsize(int fd)
{
off_t media_size;
vlong size;
struct stat s;
int n;
// Try getting disklabel from block dev
if ((n = ioctl(fd, DIOCGMEDIASIZE, &media_size)) != -1) {
size = media_size;
} else {
// must not be a block special dev
if (fstat(fd, &s) == -1) {
perror("getsize");
exit(1);
}
size = s.st_size;
}
printf("ioctl returned %d\n", n);
printf("%lld bytes\n", size);
return size;
}

View file

@ -0,0 +1,166 @@
// linux.c: low level access routines for Linux
#define _GNU_SOURCE
#include "config.h"
#include <sys/socket.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>
#include <features.h> /* for the glibc version number */
#if __GLIBC__ >= 2 && __GLIBC_MINOR >= 1
#include <netpacket/packet.h>
#include <net/ethernet.h> /* the L2 protocols */
#else
#include <asm/types.h>
#include <linux/if_packet.h>
#include <linux/if_ether.h> /* The L2 protocols */
#endif
#include <fcntl.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <net/if.h>
#include <netinet/in.h>
#include <linux/fs.h>
#include <sys/stat.h>
#include "dat.h"
#include "fns.h"
int getindx(int, char *);
int getea(int, char *, uchar *);
int
dial(char *eth, int bufcnt) // get us a raw connection to an interface
{
int i, n, s;
struct sockaddr_ll sa;
enum { aoe_type = 0x88a2 };
memset(&sa, 0, sizeof sa);
s = socket(PF_PACKET, SOCK_RAW, htons(aoe_type));
if (s == -1) {
perror("got bad socket");
return -1;
}
i = getindx(s, eth);
if (i < 0) {
perror(eth);
return -1;
}
sa.sll_family = AF_PACKET;
sa.sll_protocol = htons(0x88a2);
sa.sll_ifindex = i;
n = bind(s, (struct sockaddr *)&sa, sizeof sa);
if (n == -1) {
perror("bind funky");
return -1;
}
struct bpf_program {
ulong bf_len;
void *bf_insns;
} *bpf_program = create_bpf_program(shelf, slot);
setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, bpf_program, sizeof(*bpf_program));
free_bpf_program(bpf_program);
n = bufcnt * getmtu(s, eth);
if (setsockopt(s, SOL_SOCKET, SO_SNDBUF, &n, sizeof(n)) < 0)
perror("setsockopt SOL_SOCKET, SO_SNDBUF");
if (setsockopt(s, SOL_SOCKET, SO_RCVBUF, &n, sizeof(n)) < 0)
perror("setsockopt SOL_SOCKET, SO_RCVBUF");
return s;
}
int
getindx(int s, char *name) // return the index of device 'name'
{
struct ifreq xx;
int n;
snprintf(xx.ifr_name, sizeof xx.ifr_name, "%s", name);
n = ioctl(s, SIOCGIFINDEX, &xx);
if (n == -1)
return -1;
return xx.ifr_ifindex;
}
int
getea(int s, char *name, uchar *ea)
{
struct ifreq xx;
int n;
snprintf(xx.ifr_name, sizeof xx.ifr_name, "%s", name);
n = ioctl(s, SIOCGIFHWADDR, &xx);
if (n == -1) {
perror("Can't get hw addr");
return 0;
}
memmove(ea, xx.ifr_hwaddr.sa_data, 6);
return 1;
}
int
getmtu(int s, char *name)
{
struct ifreq xx;
int n;
snprintf(xx.ifr_name, sizeof xx.ifr_name, "%s", name);
n = ioctl(s, SIOCGIFMTU, &xx);
if (n == -1) {
perror("Can't get mtu");
return 1500;
}
return xx.ifr_mtu;
}
#if 0
int
getsec(int fd, uchar *place, vlong lba, int nsec)
{
return pread(fd, place, nsec * 512, lba * 512);
}
int
putsec(int fd, uchar *place, vlong lba, int nsec)
{
return pwrite(fd, place, nsec * 512, lba * 512);
}
#endif
int
getpkt(int fd, uchar *buf, int sz)
{
return read(fd, buf, sz);
}
int
putpkt(int fd, uchar *buf, int sz)
{
return write(fd, buf, sz);
}
vlong
getsize(int fd)
{
vlong size;
struct stat s;
int n;
n = ioctl(fd, BLKGETSIZE64, &size);
if (n == -1) { // must not be a block special
n = fstat(fd, &s);
if (n == -1) {
perror("getsize");
exit(1);
}
size = s.st_size;
}
return size;
}

View file

@ -0,0 +1,11 @@
// linux.h: header for linux.c
typedef unsigned char uchar;
typedef long long vlong;
int dial(char *);
int getindx(int, char *);
int getea(int, char *, uchar *);
int getsec(int, uchar *, vlong, int);
int putsec(int, uchar *, vlong, int);
vlong getsize(int);

View file

@ -0,0 +1,43 @@
# makefile for vblade
# see README for others
PLATFORM=linux
prefix = /usr
sbindir = ${prefix}/sbin
sharedir = ${prefix}/share
mandir = ${sharedir}/man
O=aoe.o bpf.o ${PLATFORM}.o ata.o
CFLAGS += -Wall -g -O2
CC = gcc
vblade: $O
${CC} -o vblade $O
aoe.o : aoe.c config.h dat.h fns.h makefile
${CC} ${CFLAGS} -c $<
${PLATFORM}.o : ${PLATFORM}.c config.h dat.h fns.h makefile
${CC} ${CFLAGS} -c $<
ata.o : ata.c config.h dat.h fns.h makefile
${CC} ${CFLAGS} -c $<
bpf.o : bpf.c
${CC} ${CFLAGS} -c $<
config.h : config/config.h.in makefile
@if ${CC} ${CFLAGS} config/u64.c > /dev/null 2>&1; then \
sh -xc "cp config/config.h.in config.h"; \
else \
sh -xc "sed 's!^//u64 !!' config/config.h.in > config.h"; \
fi
clean :
rm -f $O vblade
install : vblade vbladed
install vblade ${sbindir}/
install vbladed ${sbindir}/
install vblade.8 ${mandir}/man8/

View file

@ -0,0 +1,36 @@
#! /bin/sh
# sparsefile - create sparse files conveniently
#
# depends on dd and dc commands.
usage() {
echo "usage: `basename $0` {10M|10G|10T} {filename}" 1>&2
}
size=$1
if test "$size" = "-h"; then
usage
exit
fi
fnam=$2
die() {
usage
exit 1
}
set -e
units=`echo "$size" | sed 's!.*\(.\)$!\1!'`
n=`echo "$size" | sed 's!\(.*\).$!\1!'`
test "$units" && test "$n" && test "$units" != "$n" || die
case "$units" in
M)
seek=`echo "$n 1024 * 1 - p" | dc` ;;
G)
seek=`echo "$n 1024 1024 * * 1 - p" | dc` ;;
T)
seek=`echo "$n 1024 1024 1024 * * * 1 - p" | dc` ;;
*)
die
;;
esac
sh -xc "dd bs=1k count=1 if=/dev/zero of=$fnam seek=$seek"
ls -lh "$fnam"

View file

@ -0,0 +1,93 @@
.TH vblade 8
.SH NAME
vblade, vbladed \- export data via ATA over Ethernet
.SH SYNOPSIS
.nf
.B vblade [ -m mac[,mac...] ] shelf slot netif filename
.fi
.SH DESCRIPTION
The
.I vblade
command starts a process that uses raw sockets to perform ATA over
Ethernet, acting like a virtual EtherDrive (R) blade.
.PP
The
.I vbladed
script can be used to daemonize the vblade process,
detaching it from your terminal and sending its output to the system
logs.
.SS Arguments
.TP
\fBshelf\fP
This should be the shelf address (major AoE address) of the AoE device
to create.
.TP
\fBslot\fP
This should be the slot address (minor AoE address) of the AoE device
to create.
.TP
\fBnetif\fP
The name of the ethernet network interface to use for AoE
communications.
.TP
\fBfilename\fP
The name of the regular file or block device to export.
.SS Options
.TP
\fB-b\fP
The \-b flag takes an argument, the advertised buffer count, specifying
the maximum number of outstanding messages the server can queue for
processing.
.TP
\fB-d\fP
The \-d flag selects O_DIRECT mode for accessing the underlying block
device.
.TP
\fB-s\fP
The \-s flag selects O_SYNC mode for accessing the underlying block
device, so all writes are committed to disk before returning to the
client.
.TP
\fB-r\fP
The \-r flag restricts the export of the device to be read-only.
.TP
\fB-m\fP
The \-m flag takes an argument, a comma separated list of MAC addresses
permitted access to the vblade. A MAC address can be specified in upper
or lower case, with or without colons.
.TP
\fB-o\fP
The \-o flag takes an argument, the number of sectors at the beginning
of the exported file that are excluded from AoE export (default zero).
.TP
\fB-l\fP
The \-l flag takes an argument, the number of sectors to export.
Defaults to the file size in sectors minus the offset.
.SH EXAMPLE
In this example, the root user on a host named
.I nai
exports a file named "3TB" to the LAN on eth0 using AoE shelf address 11
and slot address 1. The process runs in the foreground. Using
.I vbladed
would have resulted in the process running as a daemon in the
background.
.IP
.EX
.nf
nai:~# vblade 11 1 eth0 /data/3TB
.fi
.EE
.SH NOTES AND WARNINGS
Users of Jumbo frames should read the README file distributed with
.I vblade
to learn about a workaround for kernel buffering limitations.
.PP
At least one AoE initiator (WinAoE) has been found to enforce legacy
CHS geometry for drives by discarding sectors. You should ensure that
the underlaying regular file or block device size is a multiple of
8225280 bytes (255 heads, 63 sectors/track, 512 bytes/sector) if you
encounter filesystem corruption.
.SH REPORTING BUGS
Please report bugs to the aoetools-discuss mailing list.
.SH AUTHOR
Brantley Coile (brantley@coraid.com)

Binary file not shown.

Binary file not shown.

View file

@ -0,0 +1,6 @@
#! /bin/sh
# run a vblade daemon using a logger process
# output is directed to syslogd
#
sh -c "`dirname $0`/vblade $* < /dev/null 2>&1 | logger -t vbladed" &