This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Results for Sebastian Aaltonen's buffer tester https://github.com/sebbbi/perftest | |
From Intel Haswell GT2 (i3 4010-U). It was necessary to change the threadgroup count to 64x64 down from 1024, or the test would TDR. | |
Load R8 invariant: 2.106ms | |
Load R8 linear: 13.438ms | |
Load R8 random: 6.053ms | |
Load RG8 invariant: 2.105ms | |
Load RG8 linear: 12.763ms | |
Load RG8 random: 6.229ms | |
Load RGBA8 invariant: 2.105ms |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <string> | |
#include <fstream> | |
#include <istream> | |
#include <sstream> | |
#include <boost/tokenizer.hpp> | |
#include <boost/timer/timer.hpp> | |
using namespace std; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
static const __m128i SHUFFLE_TABLE[16] = { | |
_mm_setr_epi8(12,13,14,15, 8, 9,10,11, 4, 5, 6, 7, 0, 1, 2, 3), | |
_mm_setr_epi8( 0, 1, 2, 3,12,13,14,15, 8, 9,10,11, 4, 5, 6, 7), | |
_mm_setr_epi8( 4, 5, 6, 7,12,13,14,15, 8, 9,10,11, 0, 1, 2, 3), | |
_mm_setr_epi8( 0, 1, 2, 3, 4, 5, 6, 7,12,13,14,15, 8, 9,10,11), | |
_mm_setr_epi8( 8, 9,10,11,12,13,14,15, 4, 5, 6, 7, 0, 1, 2, 3), | |
_mm_setr_epi8( 0, 1, 2, 3, 8, 9,10,11,12,13,14,15, 4, 5, 6, 7), | |
_mm_setr_epi8( 4, 5, 6, 7, 8, 9,10,11,12,13,14,15, 0, 1, 2, 3), |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Tried this, and it was marginally slower | |
// | |
// Some notes about this: | |
// 1. Seperate hit/miss arrays force me to use a lot more stack than I did before, and | |
// probably doesn't use the cache quite as well. | |
// 2. The prefetching of the rays doesn't fit in quite as neatly, and doesn't help anymore if I stick it in there | |
// it might make more sense to move that elsewhere anyway | |
// 3. LUT is 256 bytes. Not too bad, but it's probably knocking a few rays out of the cache | |
// 4. Reordering can produce at least one packet that is partially miss and partially hit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
static void __fastcall ReorderRays( StackFrame& frame, size_t nGroups ) | |
{ | |
RayPacket** pPackets = frame.pActivePackets; | |
uint32 pIDs[MAX_TRACER_SIZE]; | |
size_t nHitLoc = 0; | |
size_t nMissLoc = 8*nGroups; | |
const char* pRays = (const char*) frame.pRays; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Microsoft compiler appears to ignore prefetches inside a loop. | |
Tested this on MSVC 2013 express edition. Microsoft's connect site says I am not authorized to submit feedback for who knows what reason, or else I'd send it there directly...... | |
Code I used: | |
void Foo( char* p, int* q ) | |
{ | |
for( size_t i=0; i<8; i++ ) |