-
-
Save wolfspider/6bb959083eec2ec55fedfd6f1e68e426 to your computer and use it in GitHub Desktop.
pub fn buildPacket() []const u8 { | |
// Define the packet components | |
const dest_mac: [6]u8 = [_]u8{ 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF }; // Destination MAC | |
const src_mac: [6]u8 = [_]u8{ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; // Source MAC | |
const ethertype: [2]u8 = [_]u8{ 0x08, 0x00 }; // EtherType (IPv4) | |
const payload: [46]u8 = [_]u8{0} ** 46; // Payload (46 bytes of zeroes) | |
// Combine all components into a single array | |
const packet: [60]u8 = [_]u8{ | |
// Destination MAC | |
dest_mac[0], dest_mac[1], dest_mac[2], dest_mac[3], dest_mac[4], dest_mac[5], | |
// Source MAC | |
src_mac[0], src_mac[1], src_mac[2], src_mac[3], src_mac[4], src_mac[5], | |
// EtherType | |
ethertype[0], ethertype[1], | |
// Payload | |
payload[0], payload[1], payload[2], payload[3], | |
payload[4], payload[5], payload[6], payload[7], payload[8], payload[9], | |
payload[10], payload[11], payload[12], payload[13], payload[14], payload[15], | |
payload[16], payload[17], payload[18], payload[19], payload[20], payload[21], | |
payload[22], payload[23], payload[24], payload[25], payload[26], payload[27], | |
payload[28], payload[29], payload[30], payload[31], payload[32], payload[33], | |
payload[34], payload[35], payload[36], payload[37], payload[38], payload[39], | |
payload[40], payload[41], payload[42], payload[43], payload[44], payload[45], | |
}; | |
// Return as a slice | |
return packet[0..]; | |
} |
As time goes on things will become less procedural and hopefully that buildpacket function will too. Right now this makes it easy to drop values ad-hoc when needed and also tells the tale of the weirdness I have experienced up until now. For buildpacket I was experiencing some memcpy memory corruption when copying over non-contiguous pieces of memory and so I did unfortunately get to the point of auditing every single member of that array. Zig is a young language so I keep bumping into things like that. The performance however speaks for itself. I could write my own tools to do what Zig does and have my own DSL or just use Zig.
100Gbps seems excessive doesn't it? This is because it turns out I was updating tx and rx slot length simultaneously in some hideous mistake. Whenever you get these astronomical values it is good to check further and see what is going on. Since writing this I had a power outage and after bringing the machine back up nothing worked. It turns out there were still a few things wrong. Since then I simplified the api some more to store the addresses of both rings and buffers as pointers and successfully use anyopaque to store the buffers. This looks a bit more concise:
for (rng.slots.?.items) |slot| {
const msg: []const u8 = buildPacket();
std.debug.print("buf idx: {}\n", .{slot._slot.?.buf_idx});
const bf: [*c]u8 = @alignCast(@ptrCast(slot._view.buf));
const len: u16 = @truncate(msg.len);
std.crypto.secureZero(
u8,
bf[0..msg.len],
);
@memcpy(bf[0..msg.len], msg);
slot._slot.?.len = len;
std.debug.print("C String: {X}\n", .{bf[0..msg.len]});
idx += 1;
}
The Python speeds are as follows for tx.py:
870.232124 main_thread [2781] 17.326 Mpps (17.335 Mpkts 8.317 Gbps in 1000479 usec) 256.34 avg_batch 1 min_space
871.233117 main_thread [2781] 19.112 Mpps (19.131 Mpkts 9.174 Gbps in 1000994 usec) 256.22 avg_batch 1 min_space
872.234124 main_thread [2781] 17.408 Mpps (17.425 Mpkts 8.356 Gbps in 1001007 usec) 256.14 avg_batch 256 min_space
873.235122 main_thread [2781] 18.698 Mpps (18.717 Mpkts 8.975 Gbps in 1000998 usec) 256.34 avg_batch 1 min_space
874.236123 main_thread [2781] 19.919 Mpps (19.939 Mpkts 9.561 Gbps in 1001001 usec) 256.15 avg_batch 1 min_space
875.237122 main_thread [2781] 18.584 Mpps (18.603 Mpkts 8.921 Gbps in 1000999 usec) 256.10 avg_batch 1 min_space
876.238111 main_thread [2781] 19.494 Mpps (19.513 Mpkts 9.357 Gbps in 1000989 usec) 256.06 avg_batch 256 min_space
And now the new and improved Zig speed:
956.379630 main_thread [2781] 24.087 Mpps (24.112 Mpkts 11.562 Gbps in 1001033 usec) 256.05 avg_batch 0 min_space
957.380678 main_thread [2781] 24.205 Mpps (24.230 Mpkts 11.618 Gbps in 1001048 usec) 256.03 avg_batch 512 min_space
958.381724 main_thread [2781] 24.406 Mpps (24.431 Mpkts 11.715 Gbps in 1001047 usec) 256.02 avg_batch 512 min_space
959.382651 main_thread [2781] 24.374 Mpps (24.396 Mpkts 11.699 Gbps in 1000927 usec) 256.03 avg_batch 1 min_space
960.383717 main_thread [2781] 23.921 Mpps (23.946 Mpkts 11.482 Gbps in 1001066 usec) 256.04 avg_batch 256 min_space
961.384121 main_thread [2781] 24.063 Mpps (24.073 Mpkts 11.550 Gbps in 1000403 usec) 256.02 avg_batch 512 min_space
962.385148 main_thread [2781] 24.095 Mpps (24.120 Mpkts 11.566 Gbps in 1001027 usec) 256.01 avg_batch 256 min_space
963.386189 main_thread [2781] 22.208 Mpps (22.231 Mpkts 10.660 Gbps in 1001042 usec) 256.02 avg_batch 256 min_space
964.387235 main_thread [2781] 21.936 Mpps (21.959 Mpkts 10.529 Gbps in 1001046 usec) 256.02 avg_batch 1 min_space
965.388283 main_thread [2781] 23.325 Mpps (23.349 Mpkts 11.196 Gbps in 1001047 usec) 256.02 avg_batch 256 min_space
966.389328 main_thread [2781] 24.206 Mpps (24.232 Mpkts 11.619 Gbps in 1001046 usec) 256.02 avg_batch 1 min_space
As you can see they are nearly the same because they are, for once, actually doing the same thing! Imagine that. After getting all excited I realized how seriously bad it was that I was sending these packets with a length of 60 and somehow saturating the connection. Basically, that's just not possible if everything is exactly the same. I finally do get packets on the other end of the vale switch using onepacket.py as well:
Waiting for a packet to come
Received a packet with len 60
ffffffffffff000000000000080000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Waiting for a packet to come
Received a packet with len 60
ffffffffffff000000000000080000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Waiting for a packet to come
Received a packet with len 60
ffffffffffff000000000000080000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Waiting for a packet to come
Received a packet with len 60
ffffffffffff000000000000080000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
The number one primary thing to remember is that netmap does not make your network go faster! Despite this many applications using it are based around doing things fast- go figure.
Since departing from the Python code I also realized we could reuse a lemma for a fully safe initialization routine and so that will get added into there as well. Grabbing hold of the memory addresses can be tricky. In order to initialize a ring here is an example from the netmap headers:
Zig uses cimport to deal with these things but at times the translation is not 100% but it is good enough to get us through these rough areas:
@"type" isn't actually valid and this function fails unfortunately and this is where we need to step in and fix things up. It gets most of the way there which is much better than what another programming language would provide. Zig actually tries to articulate all the intricacies here which is why I took up coding with it in the first place.